Zachary Amsden [Fri, 20 Aug 2010 08:07:29 +0000 (22:07 -1000)]
x86: pvclock: Move scale_delta into common header
The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:28 +0000 (22:07 -1000)]
KVM: x86: Add clock sync request to hardware enable
If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.
This covers both cases.
Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).
Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.
Updated: implement better locking semantics for hardware_enable
Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called. The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:26 +0000 (22:07 -1000)]
KVM: x86: Robust TSC compensation
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:25 +0000 (22:07 -1000)]
KVM: x86: Add helper functions for time computation
Add a helper function to compute the kernel time and convert nanoseconds
back to CPU specific cycles. Note that these must not be called in preemptible
context, as that would mean the kernel could enter software suspend state,
which would cause non-atomic operation.
Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel
time helper, these should be bootbased as well.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:24 +0000 (22:07 -1000)]
KVM: x86: Fix deep C-state TSC desynchronization
When CPUs with unstable TSCs enter deep C-state, TSC may stop
running. This causes us to require resynchronization. Since
we can't tell when this may potentially happen, we assume the
worst by forcing re-compensation for it at every point the VCPU
task is descheduled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:23 +0000 (22:07 -1000)]
KVM: x86: Unify TSC logic
Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
can be done in one place.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:22 +0000 (22:07 -1000)]
KVM: x86: Warn about unstable TSC
If creating an SMP guest with unstable host TSC, issue a warning
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:21 +0000 (22:07 -1000)]
KVM: x86: Make cpu_tsc_khz updates use local CPU
This simplifies much of the init code; we can now simply always
call tsc_khz_changed, optionally passing it a new value, or letting
it figure out the existing value (while interrupts are disabled, and
thus, by inference from the rule, not raceful against CPU hotplug or
frequency updates, which will issue IPIs to the local CPU to perform
this very same task).
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:20 +0000 (22:07 -1000)]
KVM: x86: TSC reset compensation
Attempt to synchronize TSCs which are reset to the same value. In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:17 +0000 (22:07 -1000)]
KVM: x86: Move TSC offset writes to common code
Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock. While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:16 +0000 (22:07 -1000)]
KVM: x86: Convert TSC writes to TSC offset writes
Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions. Isolated as a single
patch to contain code movement.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:15 +0000 (22:07 -1000)]
KVM: x86: Drop vm_init_tsc
This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Mon, 23 Aug 2010 08:13:15 +0000 (16:13 +0800)]
KVM: MMU: fix missing percpu counter destroy
commit
ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiaotian Feng [Tue, 24 Aug 2010 02:31:07 +0000 (10:31 +0800)]
KVM: MMU: fix regression from rework mmu_shrink() code
Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Tested-by: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Thu, 19 Aug 2010 06:25:48 +0000 (14:25 +0800)]
KVM: x86 emulator: add JrCXZ instruction emulation
Add JrCXZ instruction emulation (opcode 0xe3)
Used by FreeBSD boot loader.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Mon, 23 Aug 2010 06:56:54 +0000 (14:56 +0800)]
KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation
Add LDS/LES/LFS/LGS/LSS instruction emulation.
(opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:37 +0000 (18:11 -0700)]
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:28 +0000 (18:11 -0700)]
KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages
Doing this makes the code much more readable. That's
borne out by the fact that this patch removes code. "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called. Keeping this
value as opposed to free makes the next patch simpler.
So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'. That
means they start out zeroed. I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages(). But, that only happens
via kvm ioctls.
Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:14 +0000 (18:11 -0700)]
KVM: rename x86 kvm->arch.n_alloc_mmu_pages
arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated". But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.
It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages. This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:05 +0000 (18:11 -0700)]
KVM: abstract kvm x86 mmu->n_free_mmu_pages
"free" is a poor name for this value. In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate." But "free" implies much more that the objects are there
and ready for use. "available" is a much better description, especially
when you see how it is calculated.
In this patch, we abstract its use into a function. We'll soon
replace the function's contents by calculating the value in a
different way.
All of the reads of n_free_mmu_pages are taken care of in this
patch. The modification sites will be handled in a patch
later in the series.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 19 Aug 2010 12:13:00 +0000 (15:13 +0300)]
KVM: x86 emulator: implement CWD (opcode 99)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:29:33 +0000 (19:29 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:25:28 +0000 (19:25 +0300)]
KVM: x86 emulator: add Src2Imm decoding
Needed for 3-operand IMUL.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:20:21 +0000 (19:20 +0300)]
KVM: x86 emulator: consolidate immediate decode into a function
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:54:34 +0000 (18:54 +0300)]
KVM: x86 emulator: implement RDTSC (opcode 0F 31)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:53:43 +0000 (18:53 +0300)]
KVM: x86 emulator: remove SrcImplicit
Useless.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:31:43 +0000 (18:31 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:25:25 +0000 (18:25 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 12:12:09 +0000 (15:12 +0300)]
KVM: x86 emulator: implement RET imm16 (opcode C2)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 12:11:24 +0000 (15:11 +0300)]
KVM: x86 emulator: add SrcImmU16 operand type
Used for RET NEAR instructions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 11:51:45 +0000 (14:51 +0300)]
KVM: x86 emulator: implement CALL FAR (FF /3)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 11:16:35 +0000 (14:16 +0300)]
KVM: x86 emulator: implement DAS (opcode 2F)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 16 Aug 2010 14:50:56 +0000 (17:50 +0300)]
KVM: x86 emulator: Use a register for ____emulate_2op() destination
Most x86 two operand instructions allow the destination to be a memory operand,
but IMUL (for example) requires that the destination be a register. Change
____emulate_2op() to take a register for both source and destination so we
can invoke IMUL.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 16 Aug 2010 14:49:52 +0000 (17:49 +0300)]
KVM: x86 emulator: pass destination type to ____emulate_2op()
We'll need it later so we can use a register for the destination.
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 18 Aug 2010 08:38:21 +0000 (16:38 +0800)]
KVM: x86 emulator: add LOOP/LOOPcc instruction emulation
Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2).
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 18 Aug 2010 08:43:13 +0000 (16:43 +0800)]
KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation
Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98)
Used by FreeBSD's boot loader.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:22:17 +0000 (11:22 +0300)]
KVM: x86 emulator: fix REPZ/REPNZ termination condition
EFLAGS.ZF needs to be checked after each iteration, not before.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:20:37 +0000 (11:20 +0300)]
KVM: x86 emulator: implement SCAS (opcodes AE, AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:17:51 +0000 (11:17 +0300)]
KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:46:12 +0000 (11:46 +0800)]
KVM: x86 emulator: remove dup code of in/out instruction
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:45:12 +0000 (11:45 +0800)]
KVM: x86 emulator: change OUT instruction to use dst instead of src
Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:36:51 +0000 (11:36 +0800)]
KVM: x86 emulator: introduce DstImmUByte for dst operand decode
Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 07:36:36 +0000 (15:36 +0800)]
KVM: x86 emulator: remove useless label from x86_emulate_insn()
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 09:10:07 +0000 (17:10 +0800)]
KVM: x86 emulator: add setcc instruction emulation
Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jiri Kosina [Mon, 16 Aug 2010 15:51:20 +0000 (17:51 +0200)]
KVM: x86: explain 'no-kvmclock' kernel parameter
no-kvmclock kernel parameter is missing its explanation in
Documentation/kernel-parameters.txt. Add it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 01:19:34 +0000 (09:19 +0800)]
KVM: x86 emulator: add XADD instruction emulation
Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 01:17:30 +0000 (09:17 +0800)]
KVM: x86 emulator: put register operand write back to a function
Introduce function write_register_operand() to write back the
register operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 02:08:52 +0000 (10:08 +0800)]
KVM: PPC: fix leakage of error page in kvmppc_patch_dcbz()
Add kvm_release_page_clean() after is_error_page() to avoid
leakage of error page.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 15 Aug 2010 21:47:01 +0000 (00:47 +0300)]
KVM: Separate emulation context initialization in a separate function
The code for initializing the emulation context is duplicated at two
locations (emulate_instruction() and kvm_task_switch()). Separate it
in a separate function and call it from there.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 10 Aug 2010 05:48:22 +0000 (13:48 +0800)]
KVM: x86 emulator: add bsf/bsr instruction emulation
Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 8 Aug 2010 18:11:38 +0000 (21:11 +0300)]
KVM: x86 emulator: Fix emulate_grp3 return values
This patch lets emulate_grp3() return X86EMUL_* return codes instead
of hardcoded ones.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 8 Aug 2010 18:11:37 +0000 (21:11 +0300)]
KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions
This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:39:14 +0000 (11:39 +0800)]
KVM: x86 emulator: mask group 8 instruction as BitOp
Mask group 8 instruction as BitOp, so we can share the
code for adjust the source operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:37:37 +0000 (11:37 +0800)]
KVM: x86 emulator: do not adjust the address for immediate source
adjust the dst address for a register source but not adjust the
address for an immediate source.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:34:56 +0000 (11:34 +0800)]
KVM: x86 emulator: fix negative bit offset BitOp instruction emulation
If bit offset operands is a negative number, BitOp instruction
will return wrong value. This patch fix it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Thu, 5 Aug 2010 12:42:49 +0000 (15:42 +0300)]
KVM: x86 emulator: Add stc instruction (opcode 0xf9)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 4 Aug 2010 07:38:59 +0000 (15:38 +0800)]
KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding
Using SrcOne for instruction d0/d1 decoding.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 4 Aug 2010 07:38:18 +0000 (15:38 +0800)]
KVM: x86 emulator: disable writeback when decode dest operand
This patch change to disable writeback when decode dest
operand if the dest type is ImplicitOps or not specified.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 4 Aug 2010 07:36:53 +0000 (15:36 +0800)]
KVM: x86 emulator: use SrcAcc to simplify stos decoding
Use SrcAcc to simplify stos decoding.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Wed, 4 Aug 2010 11:38:06 +0000 (14:38 +0300)]
KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce)
This adds support for int instructions to the emulator.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Wed, 4 Aug 2010 02:44:24 +0000 (05:44 +0300)]
KVM: x86 emulator: Allow accessing IDT via emulator ops
The patch adds a new member get_idt() to x86_emulate_ops.
It also adds a function to get the idt in order to be used by the emulator.
This is needed for real mode interrupt injection and the emulation of int
instructions.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Thu, 5 Aug 2010 08:34:39 +0000 (16:34 +0800)]
KVM: x86 emulator: simplify two-byte opcode check
Two-byte opcode always start with 0x0F and the decode flags
of opcode 0xF0 is always 0, so remove dup check.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:22 +0000 (15:04 +0200)]
KVM: PPC: Move KVM trampolines before __end_interrupts
When using a relocatable kernel we need to make sure that the trampline code
and the interrupt handlers are both copied to low memory. The only way to do
this reliably is to put them in the copied section.
This patch should make relocated kernels work with KVM.
KVM-Stable-Tag
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:21 +0000 (15:04 +0200)]
KVM: PPC: Make long relocations be ulong
On Book3S KVM we directly expose some asm pointers to C code as
variables. These need to be relocated and thus break on relocatable
kernels.
To make sure we can at least build, let's mark them as long instead
of u32 where 64bit relocations don't work.
This fixes the following build error:
WARNING: 2 bad relocations^M
>
c000000000008590 R_PPC64_ADDR32 .text+0x4000000000008460^M
>
c000000000008594 R_PPC64_ADDR32 .text+0x4000000000008598^M
Please keep in mind that actually using KVM on a relocated kernel
might still break. This only fixes the compile problem.
Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:20 +0000 (15:04 +0200)]
KVM: PPC: Use MSR_DR for external load_up
Book3S_32 requires MSR_DR to be disabled during load_up_xxx while on Book3S_64
it's supposed to be enabled. I misread the code and disabled it in both cases,
potentially breaking the PS3 which has a really small RMA.
This patch makes KVM work on the PS3 again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:19 +0000 (15:04 +0200)]
KVM: PPC: Add book3s_32 tlbie flush acceleration
On Book3s_32 the tlbie instruction flushed effective addresses by the mask
0x0ffff000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
to speed up that target we should also keep a special hash around for it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 29 Jul 2010 13:04:18 +0000 (15:04 +0200)]
KVM: PPC: correctly check gfn_to_pfn() return value
On failure gfn_to_pfn returns bad_page so use correct function to check
for that.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:17 +0000 (15:04 +0200)]
KVM: PPC: RCU'ify the Book3s MMU
So far we've been running all code without locking of any sort. This wasn't
really an issue because I didn't see any parallel access to the shadow MMU
code coming.
But then I started to implement dirty bitmapping to MOL which has the video
code in its own thread, so suddenly we had the dirty bitmap code run in
parallel to the shadow mmu code. And with that came trouble.
So I went ahead and made the MMU modifying functions as parallelizable as
I could think of. I hope I didn't screw up too much RCU logic :-). If you
know your way around RCU and locking and what needs to be done when, please
take a look at this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 13:04:16 +0000 (15:04 +0200)]
KVM: PPC: Book3S_32 MMU debug compile fixes
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
when enabling debugging.
This patch repairs the broken code paths, making it possible to define DEBUG_MMU
and friends again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:08 +0000 (14:48 +0200)]
KVM: PPC: Add get_pvinfo interface to query hypercall instructions
We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.
This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:07 +0000 (14:48 +0200)]
KVM: PPC: Add Documentation about PV interface
We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:06 +0000 (14:48 +0200)]
KVM: PPC: PV wrteei
On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.
So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:05 +0000 (14:48 +0200)]
KVM: PPC: PV mtmsrd L=0 and mtmsr
There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an
L=1 form. It does mtmsr even for simple things like only changing EE.
So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:04 +0000 (14:48 +0200)]
KVM: PPC: PV mtmsrd L=1
The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.
Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:03 +0000 (14:48 +0200)]
KVM: PPC: PV assembler helpers
When we hook an instruction we need to make sure we don't clobber any of
the registers at that point. So we write them out to scratch space in the
magic page. To make sure we don't fall into a race with another piece of
hooked code, we need to disable interrupts.
To make the later patches and code in general easier readable, let's introduce
a set of defines that save and restore r30, r31 and cr. Let's also define some
helpers to read the lower 32 bits of a 64 bit field on 32 bit systems.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:02 +0000 (14:48 +0200)]
KVM: PPC: Introduce branch patching helper
We will need to patch several instruction streams over to a different
code path, so we need a way to patch a single instruction with a branch
somewhere else.
This patch adds a helper to facilitate this patching.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:01 +0000 (14:48 +0200)]
KVM: PPC: Introduce kvm_tmp framework
We will soon require more sophisticated methods to replace single instructions
with multiple instructions. We do that by branching to a memory region where we
write replacement code for the instruction to.
This region needs to be within 32 MB of the patched instruction though, because
that's the furthest we can jump with immediate branches.
So we keep 1MB of free space around in bss. After we're done initing we can just
tell the mm system that the unused pages are free, but until then we have enough
space to fit all our code in.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:48:00 +0000 (14:48 +0200)]
KVM: PPC: PV tlbsync to nop
With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:59 +0000 (14:47 +0200)]
KVM: PPC: PV instructions to loads and stores
Some instructions can simply be replaced by load and store instructions to
or from the magic page.
This patch replaces often called instructions that fall into the above category.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:58 +0000 (14:47 +0200)]
KVM: PPC: KVM PV guest stubs
We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.
This patch still only contains stubs. But at least it loops through the
text section :).
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:57 +0000 (14:47 +0200)]
KVM: PPC: Generic KVM PV guest support
We have all the hypervisor pieces in place now, but the guest parts are still
missing.
This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:56 +0000 (14:47 +0200)]
KVM: Move kvm_guest_init out of generic code
Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.
So let's move the x86 specific definition of that function over to the x86
specfic header file.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:55 +0000 (14:47 +0200)]
KVM: PPC: Expose magic page support to guest
Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:54 +0000 (14:47 +0200)]
KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.
In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:53 +0000 (14:47 +0200)]
KVM: PPC: First magic page steps
We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.
This patch introduces simple defines, so the follow-up patches are easier to
read.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:52 +0000 (14:47 +0200)]
KVM: PPC: Make PAM a define
On PowerPC it's very normal to not support all of the physical RAM in real mode.
To check if we're matching on the shared page or not, we need to know the limits
so we can restrain ourselves to that range.
So let's make it a define instead of open-coding it. And while at it, let's also
increase it.
Signed-off-by: Alexander Graf <agraf@suse.de>
v2 -> v3:
- RMO -> PAM (non-magic page)
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:51 +0000 (14:47 +0200)]
KVM: PPC: Tell guest about pending interrupts
When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.
So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:50 +0000 (14:47 +0200)]
KVM: PPC: Add PV guest scratch registers
While running in hooked code we need to store register contents out because
we must not clobber any registers.
So let's add some fields to the shared page we can just happily write to.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:49 +0000 (14:47 +0200)]
KVM: PPC: Add PV guest critical sections
When running in hooked code we need a way to disable interrupts without
clobbering any interrupts or exiting out to the hypervisor.
To achieve this, we have an additional critical field in the shared page. If
that field is equal to the r1 register of the guest, it tells the hypervisor
that we're in such a critical section and thus may not receive any interrupts.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:48 +0000 (14:47 +0200)]
KVM: PPC: Implement hypervisor interface
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.
This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.
This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:47 +0000 (14:47 +0200)]
KVM: PPC: Convert SPRG[0-4] to shared page
When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:46 +0000 (14:47 +0200)]
KVM: PPC: Convert SRR0 and SRR1 to shared page
The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.
Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:45 +0000 (14:47 +0200)]
KVM: PPC: Convert DAR to shared page.
The DAR register contains the address a data page fault occured at. This
register behaves pretty much like a simple data storage register that gets
written to on data faults. There is no hypervisor interaction required on
read or write.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:44 +0000 (14:47 +0200)]
KVM: PPC: Convert DSISR to shared page
The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:43 +0000 (14:47 +0200)]
KVM: PPC: Convert MSR to shared page
One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the "interrupts enabled" flag which the guest has to
toggle in critical sections.
So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the documentation
for a list of MSR bits that are safe to be set from inside the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Thu, 29 Jul 2010 12:47:42 +0000 (14:47 +0200)]
KVM: PPC: Introduce shared page
For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.
This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Wed, 4 Aug 2010 11:41:04 +0000 (14:41 +0300)]
KVM: x86 emulator: Fix nop emulation
If a nop instruction is encountered, we jump directly to the done label.
This skip updating rip. Break from the switch case instead
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 1 Aug 2010 12:40:19 +0000 (15:40 +0300)]
KVM: x86 emulator: Decode memory operands directly into a 'struct operand'
Since modrm operand can be either register or memory, decoding it into
a 'struct operand', which can represent both, is simpler.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 1 Aug 2010 12:19:22 +0000 (15:19 +0300)]
KVM: x86 emulator: change invlpg emulation to use src.mem.addr
Instead of using modrm_ea, which will soon be gone.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 1 Aug 2010 12:13:22 +0000 (15:13 +0300)]
KVM: x86 emulator: switch LEA to use SrcMem decoding
The NoAccess flag will prevent memory from being accessed.
Signed-off-by: Avi Kivity <avi@redhat.com>