Zhang Xiantao [Sun, 18 Nov 2007 10:43:45 +0000 (18:43 +0800)]
KVM: Portability: Add two hooks to handle kvm_create and destroy vm
Add two arch hooks to handle kvm_create_vm and kvm destroy_vm. Now, just
put io_bus init and destory in common.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 16 Nov 2007 06:38:21 +0000 (14:38 +0800)]
KVM: Remove __init attributes for kvm_init_debug and kvm_init_msr_list
Since their callers are not declared with __init.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joe Perches [Tue, 13 Nov 2007 04:06:51 +0000 (20:06 -0800)]
KVM: Remove ptr comparisons to 0
Fix sparse warnings "Using plain integer as NULL pointer"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Fri, 16 Nov 2007 05:05:55 +0000 (13:05 +0800)]
KVM: Portability: Make kvm_vcpu_ioctl_translate arch dependent
Move kvm_vcpu_ioctl_translate to arch, since mmu would be put under arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 15 Nov 2007 16:06:18 +0000 (18:06 +0200)]
KVM: VMX: Consolidate register usage in vmx_vcpu_run()
We pass vcpu, vmx->fail, and vmx->launched to assembly code, but all three
are fields within vmx. Consolidate by only passing in vmx and offsets for
the rest.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Thu, 15 Nov 2007 15:07:47 +0000 (23:07 +0800)]
KVM: Portability: move KVM_CHECK_EXTENSION
Make KVM_CHECK_EXTENSION code into a function, all archs can define its
capability independently.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Sheng Yang [Thu, 15 Nov 2007 06:52:28 +0000 (14:52 +0800)]
KVM: x86 emulator: modify 'lods', and 'stos' not to depend on CR2
The current 'lods' and 'stos' is depending on incoming CR2 rather than decode
memory address from registers.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Wed, 14 Nov 2007 12:40:21 +0000 (20:40 +0800)]
KVM: Portability: Move x86 specific code from kvm_init() to kvm_arch()
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Wed, 14 Nov 2007 12:39:31 +0000 (20:39 +0800)]
KVM: Portability: Combine kvm_init and kvm_init_x86
Will be called once arch module registers itself.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Wed, 14 Nov 2007 12:38:21 +0000 (20:38 +0800)]
KVM: Portability: Add vcpu and hardware management arch hooks
Add the following hooks:
void decache_vcpus_on_cpu(int cpu);
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id);
void kvm_arch_vcpu_destory(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu);
void kvm_arch_hardware_enable(void *garbage);
void kvm_arch_hardware_disable(void *garbage);
int kvm_arch_hardware_setup(void);
void kvm_arch_hardware_unsetup(void);
void kvm_arch_check_processor_compat(void *rtn);
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Wed, 14 Nov 2007 12:09:30 +0000 (20:09 +0800)]
KVM: Portability: Move kvm_x86_ops to x86.c
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Wed, 14 Nov 2007 12:08:51 +0000 (20:08 +0800)]
KVM: Portability: Move some includes to x86.c
Move some includes to x86.c from kvm_main.c, since the related functions
have been moved to x86.c
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sun, 11 Nov 2007 20:10:22 +0000 (22:10 +0200)]
KVM: Change kvm_{read,write}_guest() to use copy_{from,to}_user()
This changes kvm_write_guest_page/kvm_read_guest_page to use
copy_to_user/read_from_user, as a result we get better speed
and better dirty bit tracking.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sun, 11 Nov 2007 20:05:04 +0000 (22:05 +0200)]
KVM: introduce gfn_to_hva()
Convert a guest frame number to the corresponding host virtual address.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sun, 11 Nov 2007 20:02:22 +0000 (22:02 +0200)]
KVM: add kvm_is_error_hva()
Check for the "error hva", an address outside the user address space that
signals a bad gfn.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 11 Nov 2007 16:37:32 +0000 (18:37 +0200)]
KVM: Simplify CPU_TASKS_FROZEN cpu notifier handling
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sun, 11 Nov 2007 12:48:17 +0000 (14:48 +0200)]
KVM: x86 emulator: remove 8 bytes operands emulator for call near instruction
it is removed beacuse it isnt supported on a real host
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Sun, 11 Nov 2007 10:28:35 +0000 (12:28 +0200)]
KVM: VMX: wbinvd exiting
Add wbinvd VM Exit support to prepare for pass-through
device cache emulation and also enhance real time
responsiveness.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Sun, 11 Nov 2007 10:27:20 +0000 (12:27 +0200)]
KVM: VMX: Comment VMX primary/secondary exec ctl definitions
Add comments for secondary/primary Processor-Based VM-execution controls.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 22 Nov 2007 09:42:59 +0000 (11:42 +0200)]
KVM: Fix faults during injection of real-mode interrupts
If vmx fails to inject a real-mode interrupt while fetching the interrupt
redirection table, it fails to record this in the vectoring information
field. So we detect this condition and do it ourselves.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 22 Nov 2007 09:30:47 +0000 (11:30 +0200)]
KVM: VMX: Read & store IDT_VECTORING_INFO_FIELD
We'll want to write to it in order to fix real-mode irq injection problems,
but it is a read-only field. Storing it in a variable solves that issue.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 8 Nov 2007 16:19:20 +0000 (18:19 +0200)]
KVM: VMX: Use vmx to inject real-mode interrupts
Instead of injecting real-mode interrupts by writing the interrupt frame into
guest memory, abuse vmx by injecting a software interrupt. We need to
pretend the software interrupt instruction had a length > 0, so we have to
adjust rip backward.
This lets us not to mess with writing guest memory, which is complex and also
sleeps.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Dor Laor [Wed, 7 Nov 2007 14:20:06 +0000 (16:20 +0200)]
KVM: Add make_page_dirty() to kvm_clear_guest_page()
Every write access to guest pages should be tracked.
Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hollis Blanchard [Thu, 1 Nov 2007 19:16:10 +0000 (14:16 -0500)]
KVM: Portability: Move x86 vcpu ioctl handlers to x86.c
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hollis Blanchard [Wed, 31 Oct 2007 22:24:25 +0000 (17:24 -0500)]
KVM: Portability: Move x86 FPU handling to x86.c
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hollis Blanchard [Wed, 31 Oct 2007 22:24:24 +0000 (17:24 -0500)]
KVM: Portability: Move x86 instruction emulation code to x86.c
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hollis Blanchard [Wed, 31 Oct 2007 22:24:23 +0000 (17:24 -0500)]
KVM: Portability: Make exported debugfs data architecture-specific
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 1 Nov 2007 04:31:28 +0000 (06:31 +0200)]
KVM: x86 emulator: Hoist modrm and abs decoding into separate functions
Signed-off-by: Avi Kivity <avi@qumranet.com>
Uri Lublin [Tue, 30 Oct 2007 08:42:09 +0000 (10:42 +0200)]
KVM: Make mark_page_dirty() work for aliased pages too.
Recommended by Izik Eidus.
Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 31 Oct 2007 09:21:06 +0000 (11:21 +0200)]
KVM: Simplify decode_register_operand() calling convention
Now that rex_prefix is part of the decode cache, there is no need to pass
it along.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 31 Oct 2007 09:15:56 +0000 (11:15 +0200)]
KVM: x86 emulator: centralize decoding of one-byte register access insns
Instructions like 'inc reg' that have the register operand encoded
in the opcode are currently specially decoded. Extend
decode_register_operand() to handle that case, indicated by having
DstReg or SrcReg without ModRM.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 31 Oct 2007 08:27:04 +0000 (10:27 +0200)]
KVM: x86 emulator: Extract the common code of SrcReg and DstReg
Share the common parts of SrcReg and DstReg decoding.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Tue, 30 Oct 2007 17:44:25 +0000 (18:44 +0100)]
KVM: Portability: Move pio emulation functions to x86.c
This patch moves implementation of the following functions from
kvm_main.c to x86.c:
free_pio_guest_pages, vcpu_find_pio_dev, pio_copy_data, complete_pio,
kernel_pio, pio_string_write, kvm_emulate_pio, kvm_emulate_pio_string
The function inject_gp, which was duplicated by yesterday's patch
series, is removed from kvm_main.c now because it is not needed anymore.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Tue, 30 Oct 2007 17:44:21 +0000 (18:44 +0100)]
KVM: Portability: Move x86 emulation and mmio device hook to x86.c
This patch moves the following functions to from kvm_main.c to x86.c:
emulator_read/write_std, vcpu_find_pervcpu_dev, vcpu_find_mmio_dev,
emulator_read/write_emulated, emulator_write_phys,
emulator_write_emulated_onepage, emulator_cmpxchg_emulated,
get_setment_base, emulate_invlpg, emulate_clts, emulator_get/set_dr,
kvm_report_emulation_failure, emulate_instruction
The following data type is moved to x86.c:
struct x86_emulate_ops emulate_ops
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Tue, 30 Oct 2007 17:44:17 +0000 (18:44 +0100)]
KVM: Portability: Move kvm_get/set_msr[_common] to x86.c
This patch moves the implementation of the functions of kvm_get/set_msr,
kvm_get/set_msr_common, and set_efer from kvm_main.c to x86.c. The
definition of EFER_RESERVED_BITS is moved too.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Mon, 29 Oct 2007 20:15:20 +0000 (15:15 -0500)]
KVM: Fix gfn_to_page() acquiring mmap_sem twice
KVM's nopage handler calls gfn_to_page() which acquires the mmap_sem when
calling out to get_user_pages(). nopage handlers are already invoked with the
mmap_sem held though. Introduce a __gfn_to_page() for use by the nopage
handler which requires the lock to already be held.
This was noticed by tglx.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Sheng Yang [Mon, 29 Oct 2007 01:40:42 +0000 (09:40 +0800)]
KVM: VMX: Enable memory mapped TPR shadow (FlexPriority)
This patch based on CR8/TPR patch, and enable the TPR shadow (FlexPriority)
for 32bit Windows. Since TPR is accessed very frequently by 32bit
Windows, especially SMP guest, with FlexPriority enabled, we saw significant
performance gain.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:09:35 +0000 (16:09 +0100)]
KVM: Portability: Move control register helper functions to x86.c
This patch moves the definitions of CR0_RESERVED_BITS,
CR4_RESERVED_BITS, and CR8_RESERVED_BITS along with the following
functions from kvm_main.c to x86.c:
set_cr0(), set_cr3(), set_cr4(), set_cr8(), get_cr8(), lmsw(),
load_pdptrs()
The static function wrapper inject_gp is duplicated in kvm_main.c and
x86.c for now, the version in kvm_main.c should disappear once the last
user of it is gone too.
The function load_pdptrs is no longer static, and now defined in x86.h
for the time being, until the last user of it is gone from kvm_main.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:09:10 +0000 (16:09 +0100)]
KVM: Portability: move get/set_apic_base to x86.c
This patch moves the implementation of get_apic_base and set_apic_base
from kvm_main.c to x86.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:08:51 +0000 (16:08 +0100)]
KVM: Portability: Move memory segmentation to x86.c
This patch moves the definition of segment_descriptor_64 for AMD64 and
EM64T from kvm_main.c to segment_descriptor.h. It also adds a proper
#ifndef...#define...#endif around that header file.
The implementation of segment_base is moved from kvm_main.c to x86.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Mon, 29 Oct 2007 15:08:35 +0000 (16:08 +0100)]
KVM: Portability: Split kvm_vm_ioctl v3
This patch splits kvm_vm_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
The patch is unchanged since last submission.
Common ioctls for all architectures are:
KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG, KVM_SET_USER_MEMORY_REGION
x86 specific ioctls are:
KVM_SET_MEMORY_REGION,
KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP,
KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP
KVM_SET_TSS_ADDR
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 28 Oct 2007 16:52:05 +0000 (18:52 +0200)]
KVM: MMU: Topup the mmu memory preallocation caches before emulating an insn
Emulation may cause a shadow pte to be instantiated, which requires
memory resources. Make sure the caches are filled to avoid an oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 28 Oct 2007 16:48:59 +0000 (18:48 +0200)]
KVM: Move page fault processing to common code
The code that dispatches the page fault and emulates if we failed to map
is duplicated across vmx and svm. Merge it to simplify further bugfixing.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 28 Oct 2007 14:34:25 +0000 (16:34 +0200)]
KVM: x86 emulator: don't depend on cr2 for mov abs emulation
The 'mov abs' instruction family (opcodes 0xa0 - 0xa3) still depends on cr2
provided by the page fault handler. This is wrong for several reasons:
- if an instruction accessed misaligned data that crosses a page boundary,
and if the fault happened on the second page, cr2 will point at the
second page, not the data itself.
- if we're emulating in real mode, or due to a FlexPriority exit, there
is no cr2 generated.
So, this change adds decoding for this instruction form and drops reliance
on cr2.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Thu, 25 Oct 2007 12:18:54 +0000 (14:18 +0200)]
KVM: SVM: Let gcc to choose which registers to save (i386)
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD i386
* Original code saves following registers:
ebx, ecx, edx, esi, edi, ebp
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
ebx, ecx, edx, esi, edi
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Thu, 25 Oct 2007 12:18:53 +0000 (14:18 +0200)]
KVM: SVM: Let gcc to choose which registers to save (x86_64)
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD x86_64.
* Original code saves following registers:
rbx, rcx, rdx, rsi, rdi, rbp,
r8, r9, r10, r11, r12, r13, r14, r15
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
rbx, rcx, rdx, rsi, rdi
r8, r9, r10, r11, r12, r13, r14, r15
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Thu, 25 Oct 2007 12:18:55 +0000 (14:18 +0200)]
KVM: VMX: Let gcc to choose which registers to save (i386)
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel i386.
* Original code saves following registers:
eax, ebx, ecx, edx, edi, esi, ebp (using popa)
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
ebx, edi, rsi
- doesn't save eax because it is an output operand (vmx->fail)
- cannot put ecx in clobber description because it is an input operand,
but as we modify it and we want to keep its value (vcpu), we must
save it (pop/push)
- ebp is saved (pop/push) because GCC seems to ignore its use the clobber
description.
- edx is saved (pop/push) because it is reserved by GCC (REGPARM) and
cannot be put in the clobber description.
- line "mov (%%esp), %3 \n\t" has been removed because %3
is ecx and ecx is restored just after.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Thu, 25 Oct 2007 12:18:52 +0000 (14:18 +0200)]
KVM: VMX: Let gcc to choose which registers to save (x86_64)
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel x86_64.
* Original code saves following registers:
rax, rbx, rcx, rdx, rsi, rdi, rbp,
r8, r9, r10, r11, r12, r13, r14, r15
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
rbx, rdi, rsi,
r8, r9, r10, r11, r12, r13, r14, r15
- doesn't save rax because it is an output operand (vmx->fail)
- cannot put rcx in clobber description because it is an input operand,
but as we modify it and we want to keep its value (vcpu), we must
save it (pop/push)
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
- rdx is saved (pop/push) because it is reserved by GCC (REGPARM) and
cannot be put in the clobber description.
- line "mov (%%rsp), %3 \n\t" has been removed because %3
is rcx and rcx is restored just after.
- line ASM_VMX_VMWRITE_RSP_RDX() is moved out of the ifdef/else/endif
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Wed, 24 Oct 2007 22:29:55 +0000 (00:29 +0200)]
KVM: Add ioctl to tss address from userspace,
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel. This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Wed, 24 Oct 2007 21:57:46 +0000 (23:57 +0200)]
KVM: Add kernel-internal memory slots
Reserve a few memory slots for kernel internal use. This is good for case
you have to register memory region and you want to be sure it was not
registered from userspace, and for case you want to register a memory region
that won't be seen from userspace.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Wed, 24 Oct 2007 21:52:57 +0000 (23:52 +0200)]
KVM: Export memory slot allocation mechanism
Remove kvm memory slot allocation mechanism from the ioctl
and put it to exported function.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Thu, 25 Oct 2007 09:54:04 +0000 (11:54 +0200)]
KVM: Unmap kernel-allocated memory on slot destruction
kvm_vm_ioctl_set_memory_region() is able to remove memory in addition to
adding it. Therefore when using kernel swapping support for old userspaces,
we need to munmap the memory if the user request to remove it
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Christian Borntraeger [Thu, 11 Oct 2007 13:34:17 +0000 (15:34 +0200)]
KVM: Per-architecture hypercall definitions
Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Wed, 10 Oct 2007 10:15:54 +0000 (12:15 +0200)]
KVM: Split IOAPIC reset function and export for kernel RESET
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Eddie Dong [Wed, 10 Oct 2007 10:14:25 +0000 (12:14 +0200)]
KVM: Export PIC reset for kernel device reset
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 21 Oct 2007 09:03:36 +0000 (11:03 +0200)]
KVM: Add a might_sleep() annotation to gfn_to_page()
This will help trap accesses to guest memory in atomic context.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 21 Oct 2007 09:00:39 +0000 (11:00 +0200)]
KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup()
Split guest reset code out of vmx_vcpu_setup(). Besides being cleaner, this
moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup()
(which is executed with preemption enabled).
[izik: remove unused variable]
Signed-off-by: Avi Kivity <avi@qumranet.com>
Zhang Xiantao [Sat, 20 Oct 2007 07:34:38 +0000 (15:34 +0800)]
KVM: Portability: Split kvm_vcpu into arch dependent and independent parts (part 1)
First step to split kvm_vcpu. Currently, we just use an macro to define
the common fields in kvm_vcpu for all archs, and all archs need to define
its own kvm_vcpu struct.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Thu, 18 Oct 2007 14:59:34 +0000 (09:59 -0500)]
KVM: Allocate userspace memory for older userspace
Allocate a userspace buffer for older userspaces. Also eliminate phys_mem
buffer. The memset() in kvmctl really kills initial memory usage but swapping
works even with old userspaces.
A side effect is that maximum guest side is reduced for older userspace on
i386.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Christian Borntraeger [Thu, 18 Oct 2007 12:39:10 +0000 (14:39 +0200)]
KVM: Use virtual cpu accounting if available for guest times.
ppc and s390 offer the possibility to track process times precisely
by looking at cpu timer on every context switch, irq, softirq etc.
We can use that infrastructure as well for guest time accounting.
We need to account the used time before we change the state.
This patch adds a call to account_system_vtime to kvm_guest_enter
and kvm_guest exit. If CONFIG_VIRT_CPU_ACCOUNTING is not set,
account_system_vtime is defined in hardirq.h as an empty function,
which means this patch does not change the behaviour on other
platforms.
I compile tested this patch on x86 and function tested the patch on
s390.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Thu, 18 Oct 2007 09:09:33 +0000 (11:09 +0200)]
KVM: MMU: Partial swapping of guest memory
This allows guest memory to be swapped. Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.
The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Wed, 17 Oct 2007 17:17:48 +0000 (19:17 +0200)]
KVM: MMU: Make gfn_to_page() always safe
In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.
This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Tue, 16 Oct 2007 12:43:46 +0000 (14:43 +0200)]
KVM: MMU: Keep a reverse mapping of non-writable translations
The current kvm mmu only reverse maps writable translation. This is used
to write-protect a page in case it becomes a pagetable.
But with swapping support, we need a reverse mapping of read-only pages as
well: when we evict a page, we need to remove any mapping to it, whether
writable or not.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Tue, 16 Oct 2007 12:42:30 +0000 (14:42 +0200)]
KVM: MMU: Add rmap_next(), a helper for walking kvm rmaps
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Nitin A Kamble [Wed, 17 Oct 2007 01:23:27 +0000 (18:23 -0700)]
KVM: x86 emulator: cmc, clc, cli, sti
Instruction: cmc, clc, cli, sti
opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively.
[avi: fix reference to EFLG_IF which is not defined anywhere]
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Wed, 17 Oct 2007 10:18:47 +0000 (12:18 +0200)]
KVM: MMU: Simplify page table walker
Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.
As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Nitin A Kamble [Sat, 13 Oct 2007 00:40:33 +0000 (17:40 -0700)]
KVM: x86 emulator: Implement emulation of instruction: inc & dec
Instructions:
inc r16/r32 (opcode 0x40-0x47)
dec r16/r32 (opcode 0x48-0x4f)
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 16 Oct 2007 15:22:08 +0000 (17:22 +0200)]
KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSH
We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 16 Oct 2007 14:23:22 +0000 (16:23 +0200)]
KVM: Move apic timer interrupt backlog processing to common code
Beside the obvious goodness of making code more common, this prevents
a livelock with the next patch which moves interrupt injection out of the
critical section.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 12 Oct 2007 09:01:59 +0000 (11:01 +0200)]
KVM: Add some \n in ioapic_debug()
Add new-line at end of debug strings.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Qing He [Mon, 24 Sep 2007 09:39:41 +0000 (17:39 +0800)]
KVM: apic round robin cleanup
If no apic is enabled in the bitmap of an interrupt delivery with delivery
mode of lowest priority, a warning should be reported rather than select
a fallback vcpu
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Thu, 11 Oct 2007 17:16:52 +0000 (19:16 +0200)]
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 13:30:21 +0000 (15:30 +0200)]
KVM: MMU: When updating the dirty bit, inform the mmu about it
Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 13:22:59 +0000 (15:22 +0200)]
KVM: MMU: Move dirty bit updates to a separate function
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 13:13:49 +0000 (15:13 +0200)]
KVM: MMU: Instantiate real-mode shadows as user writable shadows
This is consistent with real-mode permissions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 13:12:24 +0000 (15:12 +0200)]
KVM: MMU: Disable write access on clean large pages
By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page. This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 13:08:41 +0000 (15:08 +0200)]
KVM: MMU: Fix nx access bit for huge pages
We must set the bit before the shift, otherwise the wrong bit gets set.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 11 Oct 2007 10:32:30 +0000 (12:32 +0200)]
KVM: Move guest pte dirty bit management to the guest pagetable walker
This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Thu, 11 Oct 2007 01:08:41 +0000 (20:08 -0500)]
KVM: MMU: More struct kvm_vcpu -> struct kvm cleanups
This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Thu, 11 Oct 2007 00:25:50 +0000 (19:25 -0500)]
KVM: MMU: Clean up MMU functions to take struct kvm when appropriate
Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs. This patch cleans up some of them to instead take a struct kvm. This
makes things a bit more clear.
The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Carsten Otte [Wed, 10 Oct 2007 15:16:19 +0000 (17:16 +0200)]
KVM: Move x86 msr handling to new files x86.[ch]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Tue, 9 Oct 2007 17:20:39 +0000 (19:20 +0200)]
KVM: Support assigning userspace memory to the guest
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.
This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Mike Day [Mon, 8 Oct 2007 13:02:08 +0000 (09:02 -0400)]
KVM: CodingStyle cleanup
Signed-off-by: Mike D. Day <ncmike@ncultra.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Rusty Russell [Mon, 8 Oct 2007 00:55:29 +0000 (10:55 +1000)]
KVM: Remove gratuitous casts from lapic.c
Since vcpu->apic is of the correct type, there's not need to cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Rusty Russell [Mon, 8 Oct 2007 00:50:48 +0000 (10:50 +1000)]
KVM: Hoist kvm_create_lapic() into kvm_vcpu_init()
Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm
and vmx do it. And make it return the error rather than a fairly
random -ENOMEM.
This also solves the problem that neither svm.c nor vmx.c actually
handles the error path properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Rusty Russell [Mon, 8 Oct 2007 00:48:30 +0000 (10:48 +1000)]
KVM: Add kvm_free_lapic() to pair with kvm_create_lapic()
Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic().
And guess what? I found a minor bug: we don't need to hrtimer_cancel()
from kvm_main.c, because we do that in kvm_free_apic().
Also:
1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init.
2) Don't set apic->regs_page to zero before freeing apic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Tue, 2 Oct 2007 16:52:55 +0000 (18:52 +0200)]
KVM: Allow dynamic allocation of the mmu shadow cache size
The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Mon, 1 Oct 2007 20:14:18 +0000 (22:14 +0200)]
KVM: Add general accessors to read and write guest memory
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Thu, 27 Sep 2007 12:11:22 +0000 (14:11 +0200)]
KVM: Remove the usage of page->private field by rmap
When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem. So we move the rmap base pointers to the memory slot.
A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.
Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 30 Sep 2007 09:02:53 +0000 (11:02 +0200)]
KVM: VMX: Simplify vcpu_clear()
Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 30 Sep 2007 08:50:12 +0000 (10:50 +0200)]
KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processor
Noted by Eddie Dong.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Tue, 25 Sep 2007 11:36:40 +0000 (13:36 +0200)]
KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effect
This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1. In Xen, this modification has been introduced by
Jan Beulich.
http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.html
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Mon, 24 Sep 2007 15:00:58 +0000 (17:00 +0200)]
KVM: Purify x86_decode_insn() error case management
The only valid case is on protected page access, other cases are errors.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Qing He [Mon, 24 Sep 2007 09:22:13 +0000 (17:22 +0800)]
KVM: x86_emulator: no writeback for bt
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Mon, 24 Sep 2007 09:10:56 +0000 (11:10 +0200)]
KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE instead
Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Mon, 24 Sep 2007 09:10:55 +0000 (11:10 +0200)]
KVM: x86 emulator: remove _eflags and use directly ctxt->eflags.
Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as
it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags
only if emulation doesn't fail.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Mon, 24 Sep 2007 09:10:54 +0000 (11:10 +0200)]
KVM: x86 emulator: split some decoding into functions for readability
To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation
parts into functions.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Ryan Harper [Tue, 18 Sep 2007 19:05:16 +0000 (14:05 -0500)]
KVM: MMU: Ignore reserved bits in cr3 in non-pae mode
This patch removes the fault injected when the guest attempts to set reserved
bits in cr3. X86 hardware doesn't generate a fault when setting reserved bits.
The result of this patch is that vmware-server, running within a kvm guest,
boots and runs memtest from an iso.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 23 Sep 2007 12:10:49 +0000 (14:10 +0200)]
KVM: MMU: Make flooding detection work when guest page faults are bypassed
When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging. So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 16 Sep 2007 16:58:32 +0000 (18:58 +0200)]
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>