Takuya Yoshikawa [Tue, 16 Nov 2010 08:35:02 +0000 (17:35 +0900)]
KVM: take kvm_lock for hardware_disable() during cpu hotplug
In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Wed, 17 Nov 2010 04:11:41 +0000 (12:11 +0800)]
KVM: MMU: don't mark spte notrap if reserved bit set
If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:07 +0000 (22:30 +0100)]
KVM: Document device assigment API
Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:06 +0000 (22:30 +0100)]
KVM: Clean up kvm_vm_ioctl_assigned_device
Any arch not supporting device assigment will also not build
assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless.
KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default
case for dispatching the ioctl.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:05 +0000 (22:30 +0100)]
KVM: Save/restore state of assigned PCI device
The guest may change states that pci_reset_function does not touch. So
we better save/restore the assigned device across guest usage.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:04 +0000 (22:30 +0100)]
KVM: Refactor IRQ names of assigned devices
Cosmetic change, but it helps to correlate IRQs with PCI devices.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:03 +0000 (22:30 +0100)]
KVM: Switch assigned device IRQ forwarding to threaded handler
This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.
Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Tue, 16 Nov 2010 21:30:02 +0000 (22:30 +0100)]
KVM: Clear assigned guest IRQ on release
When we deassign a guest IRQ, clear the potentially asserted guest line.
There might be no chance for the guest to do this, specifically if we
switch from INTx to MSI mode.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Tue, 9 Nov 2010 14:15:43 +0000 (16:15 +0200)]
KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info
This allows Linux to mask cpuid bits if, for example, nx is enabled on only
some cpus.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Tue, 9 Nov 2010 14:15:42 +0000 (16:15 +0200)]
KVM: SVM: Replace svm_has() by standard Linux cpuid accessors
Instead of querying cpuid directly, use the Linux accessors (boot_cpu_has,
etc.). This allows the things like the clearcpuid kernel command line to
work (when it's fixed wrt scattered cpuid bits).
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 12 Nov 2010 06:49:55 +0000 (14:49 +0800)]
KVM: MMU: fix apf prefault if nested guest is enabled
If apf is generated in L2 guest and is completed in L1 guest, it will
prefault this apf in L1 guest's mmu context.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 12 Nov 2010 06:49:11 +0000 (14:49 +0800)]
KVM: MMU: support apf for nonpaing guest
Let's support apf for nonpaing guest
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 12 Nov 2010 06:47:01 +0000 (14:47 +0800)]
KVM: MMU: clear apfs if page state is changed
If CR0.PG is changed, the page fault cann't be avoid when the prefault address
is accessed later
And it also fix a bug: it can retry a page enabled #PF in page disabled context
if mmu is shadow page
This idear is from Gleb Natapov
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 12 Nov 2010 06:46:08 +0000 (14:46 +0800)]
KVM: MMU: fix missing post sync audit
Add AUDIT_POST_SYNC audit for long mode shadow page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 9 Nov 2010 16:02:49 +0000 (17:02 +0100)]
KVM: Clean up vm creation and release
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.
It also fixes error clean-up on failures of kvm_create_vm for IA64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Tracey Dent [Sat, 6 Nov 2010 18:52:58 +0000 (14:52 -0400)]
KVM: x86: Makefile clean up
Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS.
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Thu, 4 Nov 2010 10:29:42 +0000 (18:29 +0800)]
KVM: remove unused function declaration
Remove the declaration of kvm_mmu_set_base_ptes()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 9 Nov 2010 11:42:12 +0000 (12:42 +0100)]
KVM: Refactor srcu struct release on early errors
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 1 Nov 2010 21:20:48 +0000 (23:20 +0200)]
KVM: VMX: Disallow NMI while blocked by STI
While not mandated by the spec, Linux relies on NMI being blocked by an
IF-enabling STI. VMX also refuses to enter a guest in this state, at
least on some implementations.
Disallow NMI while blocked by STI by checking for the condition, and
requesting an interrupt window exit if it occurs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Mon, 1 Nov 2010 09:03:44 +0000 (17:03 +0800)]
KVM: fix the race while wakeup all pv guest
In kvm_async_pf_wakeup_all(), we add a dummy apf to vcpu->async_pf.done
without holding vcpu->async_pf.lock, it will break if we are handling apfs
at this time.
Also use 'list_empty_careful()' instead of 'list_empty()'
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Tue, 2 Nov 2010 09:35:35 +0000 (17:35 +0800)]
KVM: handle more completed apfs if possible
If it's no need to inject async #PF to PV guest we can handle
more completed apfs at one time, so we can retry guest #PF
as early as possible
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Mon, 1 Nov 2010 09:01:28 +0000 (17:01 +0800)]
KVM: avoid unnecessary wait for a async pf
In current code, it checks async pf completion out of the wait context,
like this:
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
r = vcpu_enter_guest(vcpu);
else {
......
kvm_vcpu_block(vcpu)
^- waiting until 'async_pf.done' is not empty
}
kvm_check_async_pf_completion(vcpu)
^- delete list from async_pf.done
So, if we check aysnc pf completion first, it can be blocked at
kvm_vcpu_block
Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion()
path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Mon, 1 Nov 2010 09:00:30 +0000 (17:00 +0800)]
KVM: fix searching async gfn in kvm_async_pf_gfn_slot
Don't search later slots if the slot is empty
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Mon, 1 Nov 2010 08:59:39 +0000 (16:59 +0800)]
KVM: cleanup async_pf tracepoints
Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Mon, 1 Nov 2010 08:58:43 +0000 (16:58 +0800)]
KVM: fix tracing kvm_try_async_get_page
Tracing 'async' and *pfn is useless, since 'async' is always true,
and '*pfn' is always "fault_pfn'
We can trace 'gva' and 'gfn' instead, it can help us to see the
life-cycle of an async_pf
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Tue, 2 Nov 2010 01:49:34 +0000 (10:49 +0900)]
KVM: replace vmalloc and memset with vzalloc
Let's use newly introduced vzalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 1 Nov 2010 13:35:01 +0000 (15:35 +0200)]
KVM: handle exit due to INVD in VMX
Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Mon, 1 Nov 2010 13:01:29 +0000 (14:01 +0100)]
KVM: x86: Avoid issuing wbinvd twice
Micro optimization to avoid calling wbinvd twice on the CPU that has to
emulate it. As we might be preempted between smp_call_function_many and
the local wbinvd, the cache might be filled again so that real work
could be done uselessly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Heiko Carstens [Wed, 27 Oct 2010 15:22:10 +0000 (17:22 +0200)]
KVM: get rid of warning within kvm_dev_ioctl_create_vm
Fixes this:
CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Heiko Carstens [Wed, 27 Oct 2010 15:21:21 +0000 (17:21 +0200)]
KVM: add cast within kvm_clear_guest_page to fix warning
Fixes this:
CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Mon, 1 Nov 2010 05:36:09 +0000 (14:36 +0900)]
KVM: use kmalloc() for small dirty bitmaps
Currently we are using vmalloc() for all dirty bitmaps even if
they are small enough, say less than K bytes.
We use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE so that we can avoid vmalloc area usage for VGA.
This will also make the logging start/stop faster.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Wed, 27 Oct 2010 09:23:54 +0000 (18:23 +0900)]
KVM: pre-allocate one more dirty bitmap to avoid vmalloc()
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.
This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.
Performance improvement:
I measured performance for the case of VGA update by trace-cmd.
The result was 1.5 times faster than the original one.
In the case of live migration, the improvement ratio depends on the workload
and the guest memory size. In general, the larger the memory size is the more
benefits we get.
Note:
This does not change other architectures's logic but the allocation size
becomes twice. This will increase the actual memory consumption only when
the new size changes the number of pages allocated by vmalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Wed, 27 Oct 2010 09:22:19 +0000 (18:22 +0900)]
KVM: introduce wrapper functions for creating/destroying dirty bitmaps
This makes it easy to change the way of allocating/freeing dirty bitmaps.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Sun, 24 Oct 2010 14:49:08 +0000 (16:49 +0200)]
KVM: x86: trace "exit to userspace" event
Add tracepoint for userspace exit.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Fri, 22 Oct 2010 16:18:18 +0000 (14:18 -0200)]
KVM: propagate fault r/w information to gup(), allow read-only memory
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Fri, 22 Oct 2010 16:18:17 +0000 (14:18 -0200)]
KVM: MMU: flush TLBs on writable -> read-only spte overwrite
This can happen in the following scenario:
vcpu0 vcpu1
read fault
gup(.write=0)
gup(.write=1)
reuse swap cache, no COW
set writable spte
use writable spte
set read-only spte
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Fri, 22 Oct 2010 16:18:16 +0000 (14:18 -0200)]
KVM: MMU: remove kvm_mmu_set_base_ptes
Unused.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Fri, 22 Oct 2010 16:18:15 +0000 (14:18 -0200)]
KVM: VMX: remove setting of shadow_base_ptes for EPT
The EPT present/writable bits use the same position as normal
pagetable bits.
Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 25 Oct 2010 13:23:55 +0000 (15:23 +0200)]
KVM: Avoid double interrupt injection with vapic
After an interrupt injection, the PPR changes, and we have to reflect that
into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the
whole interrupt injection routine to be run again (harmlessly).
Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise
there is no chance that a new injection is needed.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 21 Oct 2010 10:20:34 +0000 (12:20 +0200)]
KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers
This abstraction only serves to obfuscate. Remove.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Oct 2010 10:20:33 +0000 (12:20 +0200)]
KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path
ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386). So save/restore them in the heavyweight exit path instead
of the lightweight path.
By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.
[jan: fix copy/pase mistake on i386]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Oct 2010 10:20:32 +0000 (12:20 +0200)]
KVM: SVM: Move svm->host_gs_base into a separate structure
More members will join it soon.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Oct 2010 10:20:31 +0000 (12:20 +0200)]
KVM: SVM: Move guest register save out of interrupts disabled section
Saving guest registers is just a memory copy, and does not need to be in the
critical section. Move outside the critical section to improve latency a
bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Oct 2010 16:34:54 +0000 (18:34 +0200)]
KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason
May otherwise generates build warnings about unused
kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST
enabled.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andi Kleen [Wed, 20 Oct 2010 15:56:17 +0000 (17:56 +0200)]
KVM: Move KVM context switch into own function
gcc 4.5 with some special options is able to duplicate the VMX
context switch asm in vmx_vcpu_run(). This results in a compile error
because the inline asm sequence uses an on local label. The non local
label is needed because other code wants to set up the return address.
This patch moves the asm code into an own function and marks
that explicitely noinline to avoid this problem.
Better would be probably to just move it into an .S file.
The diff looks worse than the change really is, it's all just
code movement and no logic change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Oct 2010 13:18:02 +0000 (15:18 +0200)]
KVM: x86: Mark kvm_arch_setup_async_pf static
It has no user outside mmu.c and also no prototype.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Tue, 19 Oct 2010 16:13:41 +0000 (18:13 +0200)]
KVM: improve hva_to_pfn() readability
Improve vma handling code readability in hva_to_pfn() and fix
async pf handling code to properly check vma returned by find_vma().
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:56 +0000 (11:22 +0200)]
KVM: Send async PF when guest is not in userspace too.
If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:55 +0000 (11:22 +0200)]
KVM: Let host know whether the guest can handle async PF in non-userspace context.
If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:54 +0000 (11:22 +0200)]
KVM paravirt: Handle async PF in non preemptable context
If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:53 +0000 (11:22 +0200)]
KVM: Inject asynchronous page fault into a PV guest if page is swapped out.
Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.
Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.
Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:52 +0000 (11:22 +0200)]
KVM: Handle async PF in a guest.
When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:51 +0000 (11:22 +0200)]
KVM paravirt: Add async PF initialization to PV guest.
Enable async PF in a guest if async PF capability is discovered.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:50 +0000 (11:22 +0200)]
KVM: Add PV MSR to enable asynchronous page faults delivery.
Guest enables async PF vcpu functionality using this MSR.
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:49 +0000 (11:22 +0200)]
KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 18 Oct 2010 13:22:23 +0000 (15:22 +0200)]
KVM: Add memory slot versioning and use it to provide fast guest write interface
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Sun, 17 Oct 2010 16:13:42 +0000 (18:13 +0200)]
KVM: Retry fault before vmentry
When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 14 Oct 2010 09:22:46 +0000 (11:22 +0200)]
KVM: Halt vcpu if page it tries to access is swapped out
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.
Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.
[avi: remove call to get_user_pages_noio(), nacked by Linus; this
makes everything synchrnous again]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 11 Oct 2010 12:23:39 +0000 (14:23 +0200)]
KVM: Don't reset mmu context unnecessarily when updating EFER
The only bit of EFER that affects the mmu is NX, and this is already
accounted for (LME only takes effect when changing cr0).
Based on a patch by Hillf Danton.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Fri, 31 Dec 2010 08:52:15 +0000 (10:52 +0200)]
KVM: i8259: initialize isr_ack
isr_ack is never initialized. So, until the first PIC reset, interrupts
may fail to be injected. This can cause Windows XP to fail to boot, as
reported in the fallout from the fix to
https://bugzilla.kernel.org/show_bug.cgi?id=21962.
Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 28 Dec 2010 10:09:07 +0000 (12:09 +0200)]
KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow
We use the physical address instead of the base gfn for the four
PAE page directories we use in unpaged mode. When the guest accesses
an address above 1GB that is backed by a large host page, a BUG_ON()
in kvm_mmu_set_gfn() triggers.
Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962
Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com>
KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Linus Torvalds [Sat, 18 Dec 2010 18:28:54 +0000 (10:28 -0800)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: handle rt_sigreturn() more cleanly
arch/tile: handle CLONE_SETTLS in copy_thread(), not user space
Linus Torvalds [Sat, 18 Dec 2010 18:23:29 +0000 (10:23 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
MIPS: Fix build errors in sc-mips.c
Linus Torvalds [Sat, 18 Dec 2010 18:13:24 +0000 (10:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86: avoid high BIOS area when allocating address space
x86: avoid E820 regions when allocating address space
x86: avoid low BIOS area when allocating address space
resources: add arch hook for preventing allocation in reserved areas
Revert "resources: support allocating space within a region from the top down"
Revert "PCI: allocate bus resources from the top down"
Revert "x86/PCI: allocate space from the end of a region, not the beginning"
Revert "x86: allocate space within a region top-down"
Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
PCI: Update MCP55 quirk to not affect non HyperTransport variants
Chris Metcalf [Tue, 14 Dec 2010 21:07:25 +0000 (16:07 -0500)]
arch/tile: handle rt_sigreturn() more cleanly
The current tile rt_sigreturn() syscall pattern uses the common idiom
of loading up pt_regs with all the saved registers from the time of
the signal, then anticipating the fact that we will clobber the ABI
"return value" register (r0) as we return from the syscall by setting
the rt_sigreturn return value to whatever random value was in the pt_regs
for r0.
However, this breaks in our 64-bit kernel when running "compat" tasks,
since we always sign-extend the "return value" register to properly
handle returned pointers that are in the upper 2GB of the 32-bit compat
address space. Doing this to the sigreturn path then causes occasional
random corruption of the 64-bit r0 register.
Instead, we stop doing the crazy "load the return-value register"
hack in sigreturn. We already have some sigreturn-specific assembly
code that we use to pass the pt_regs pointer to C code. We extend that
code to also set the link register to point to a spot a few instructions
after the usual syscall return address so we don't clobber the saved r0.
Now it no longer matters what the rt_sigreturn syscall returns, and the
pt_regs structure can be cleanly and completely reloaded.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Chris Metcalf [Tue, 14 Dec 2010 20:57:49 +0000 (15:57 -0500)]
arch/tile: handle CLONE_SETTLS in copy_thread(), not user space
Previously we were just setting up the "tp" register in the
new task as started by clone() in libc. However, this is not
quite right, since in principle a signal might be delivered to
the new task before it had its TLS set up. (Of course, this race
window still exists for resetting the libc getpid() cached value
in the new task, in principle. But in any case, we are now doing
this exactly the way all other architectures do it.)
This change is important for 2.6.37 since the tile glibc we will
be submitting upstream will not set TLS in user space any more,
so it will only work on a kernel that has this fix. It should
also be taken for 2.6.36.x in the stable tree if possible.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: stable <stable@kernel.org>
Kevin Cernekee [Wed, 3 Nov 2010 05:28:01 +0000 (22:28 -0700)]
MIPS: Fix build errors in sc-mips.c
Seen with malta_defconfig on Linus' tree:
CC arch/mips/mm/sc-mips.o
arch/mips/mm/sc-mips.c: In function 'mips_sc_is_activated':
arch/mips/mm/sc-mips.c:77: error: 'config2' undeclared (first use in this function)
arch/mips/mm/sc-mips.c:77: error: (Each undeclared identifier is reported only once
arch/mips/mm/sc-mips.c:77: error: for each function it appears in.)
arch/mips/mm/sc-mips.c:81: error: 'tmp' undeclared (first use in this function)
make[2]: *** [arch/mips/mm/sc-mips.o] Error 1
make[1]: *** [arch/mips/mm] Error 2
make: *** [arch/mips] Error 2
[Ralf: Cosmetic changes to minimize the number of arguments passed to
mips_sc_is_activated]
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1752/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:39:02 +0000 (10:39 -0700)]
x86: avoid high BIOS area when allocating address space
This prevents allocation of the last 2MB before 4GB.
The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27
This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:56 +0000 (10:38 -0700)]
x86: avoid E820 regions when allocating address space
When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.
On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS. On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,
BIOS-e820:
00000000bfe4dc00 -
00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory. This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.
I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below). That means we're vulnerable to BIOS defects that Windows would not
trip over. For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:51 +0000 (10:38 -0700)]
x86: avoid low BIOS area when allocating address space
This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas. This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.
We previously avoided this area in pcibios_align_resource(). This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:46 +0000 (10:38 -0700)]
resources: add arch hook for preventing allocation in reserved areas
This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.
Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well. For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:41 +0000 (10:38 -0700)]
Revert "resources: support allocating space within a region from the top down"
This reverts commit
e7f8567db9a7f6b3151b0b275e245c1cef0d9c70.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:36 +0000 (10:38 -0700)]
Revert "PCI: allocate bus resources from the top down"
This reverts commit
b126b4703afa4010b161784a43650337676dd03b.
We're going back to the old behavior of allocating from bus resources
in _CRS order.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:31 +0000 (10:38 -0700)]
Revert "x86/PCI: allocate space from the end of a region, not the beginning"
This reverts commit
dc9887dc02e37bcf83f4e792aa14b07782ef54cf.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:25 +0000 (10:38 -0700)]
Revert "x86: allocate space within a region top-down"
This reverts commit
1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Thu, 16 Dec 2010 17:38:20 +0000 (10:38 -0700)]
Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
This reverts commit
82e3e767c21fef2b1b38868e20eb4e470a1e38e3.
We're going back to considering bus resources in the order we found
them (in _CRS order, when we're using _CRS), so we don't need to
define any ordering.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Linus Torvalds [Fri, 17 Dec 2010 17:45:25 +0000 (09:45 -0800)]
Merge branch 'for_linus' of git://github.com/at91linux/linux-2.6-at91
* 'for_linus' of git://github.com/at91linux/linux-2.6-at91:
at91: Refactor Stamp9G20 and PControl G20 board file
at91: Fix uhpck clock rate in upll case
Linus Torvalds [Fri, 17 Dec 2010 17:32:39 +0000 (09:32 -0800)]
Merge branch 'kvm-updates/2.6.37' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix preemption counter leak in kvm_timer_init()
KVM: enlarge number of possible CPUID leaves
KVM: SVM: Do not report xsave in supported cpuid
KVM: Fix OSXSAVE after migration
Linus Torvalds [Fri, 17 Dec 2010 17:31:59 +0000 (09:31 -0800)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Runtime: Fix pm_runtime_suspended()
PM / Hibernate: Restore old swap signature to avoid user space breakage
PM / Hibernate: Fix PM_POST_* notification with user-space suspend
Linus Torvalds [Fri, 17 Dec 2010 17:28:17 +0000 (09:28 -0800)]
Merge branch 'bkl_removal' of git://git./linux/kernel/git/mchehab/linux-2.6
* 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
[media] uvcvideo: Convert to unlocked_ioctl
[media] uvcvideo: Lock stream mutex when accessing format-related information
[media] uvcvideo: Move mmap() handler to uvc_queue.c
[media] uvcvideo: Move mutex lock/unlock inside uvc_free_buffers
[media] uvcvideo: Lock controls mutex when querying menus
[media] v4l2-dev: fix race condition
[media] V4L: improve the BKL replacement heuristic
[media] v4l2-dev: use mutex_lock_interruptible instead of plain mutex_lock
[media] cx18: convert to unlocked_ioctl
[media] radio-timb: convert to unlocked_ioctl
[media] sh_vou: convert to unlocked_ioctl
[media] cafe_ccic: replace ioctl by unlocked_ioctl
[media] et61x251_core: trivial conversion to unlocked_ioctl
[media] sn9c102: convert to unlocked_ioctl
[media] BKL: trivial ioctl -> unlocked_ioctl video driver conversions
[media] typhoon: convert to unlocked_ioctl
[media] si4713: convert to unlocked_ioctl
[media] tea5764: convert to unlocked_ioctl
[media] cadet: use unlocked_ioctl
[media] BKL: trivial BKL removal from V4L2 radio drivers
Linus Torvalds [Fri, 17 Dec 2010 17:27:30 +0000 (09:27 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Fix conflict of Mic Boot controls
ALSA: HDA: Enable subwoofer on Asus G73Jw
ALSA: HDA: Fix auto-mute on Lenovo Edge 14
ASoC: Fix bias power down of non-DAPM codec
ASoC: WM8580: Fix R8 initial value
ASoC: fix deemphasis control in wm8904/55/60 codecs
Takashi Iwai [Fri, 17 Dec 2010 14:28:37 +0000 (15:28 +0100)]
Merge branch 'fix/asoc' into for-linus
Takashi Iwai [Fri, 17 Dec 2010 14:28:33 +0000 (15:28 +0100)]
Merge branch 'fix/hda' into for-linus
Takashi Iwai [Fri, 17 Dec 2010 14:23:41 +0000 (15:23 +0100)]
ALSA: hda - Fix conflict of Mic Boot controls
Due to the recent change for multiple mics assignment, we need to handle
the index of each Mic Boost control respectively. Otherwise the driver
gets the control element conflicts, and gives the unsable state.
Reference: kernel bug 25002
https://bugzilla.kernel.org/show_bug.cgi?id=25002
Reported-and-tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Christian Glindkamp [Thu, 9 Dec 2010 10:15:59 +0000 (11:15 +0100)]
at91: Refactor Stamp9G20 and PControl G20 board file
As PControl G20 is a carrier board for the Stamp9G20 SoM, some code can
be shared. Therefore board-stamp9g20.c is refactored to allow reusing the
SoM initialization and board-pcontrol-g20.c is modified to use it.
Signed-off-by: Christian Glindkamp <christian.glindkamp@taskit.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Ryan Mallon [Wed, 2 Jun 2010 00:55:36 +0000 (12:55 +1200)]
at91: Fix uhpck clock rate in upll case
The uhpck clock should be divided from the utmi clock, not its parent
(main). This change is mostly cosmetic as the uhpck rate value is not
used anywhere except for the debugfs clock output.
Signed-off-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Linus Torvalds [Thu, 16 Dec 2010 23:45:49 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
fanotify: fill in the metadata_len field on struct fanotify_event_metadata
fanotify: split version into version and metadata_len
fanotify: Dont try to open a file descriptor for the overflow event
fanotify: Introduce FAN_NOFD
fanotify: do not leak user reference on allocation failure
inotify: stop kernel memory leak on file creation failure
fanotify: on group destroy allow all waiters to bypass permission check
fanotify: Dont allow a mask of 0 if setting or removing a mark
fanotify: correct broken ref counting in case adding a mark failed
fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
fanotify: remove packed from access response message
fanotify: deny permissions when no event was sent
Linus Torvalds [Thu, 16 Dec 2010 23:45:25 +0000 (15:45 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (28 commits)
MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option.
MIPS: LD/SD o32 macro GAS fix update
MIPS: Alchemy: fix build with SERIAL_8250=n
MIPS: Rename mips_dma_cache_sync back to dma_cache_sync
MIPS: MT: Fix typo in comment.
SSB: Fix nvram_get on BCM47xx platform
MIPS: BCM47xx: Swap serial console if ttyS1 was specified.
MIPS: BCM47xx: Use sscanf for parsing mac address
MIPS: BCM47xx: Fill values for b43 into SSB sprom
MIPS: BCM47xx: Do not read config from CFE
MIPS: FDT size is a be32
MIPS: Fix CP0 COUNTER clockevent race
MIPS: Fix regression on BCM4710 processor detection
MIPS: JZ4740: Fix pcm device name
MIPS: Separate two consecutive loads in memset.S
MIPS: Send proper signal and siginfo on FP emulator faults.
MIPS: AR7: Fix loops per jiffies on TNETD7200 devices
MIPS: AR7: Fix double ar7_gpio_init declaration
MIPS: Rework GENERIC_HARDIRQS Kconfig.
MIPS: Alchemy: Add return value check for strict_strtoul()
...
Neil Horman [Wed, 8 Dec 2010 14:47:48 +0000 (09:47 -0500)]
PCI: Update MCP55 quirk to not affect non HyperTransport variants
I wrote this quirk awhile ago to properly setup MCP55 chips on hypertransport
busses so that interrupts reached whatever cpu happend to boot the kdump kernel.
while that works well, it was recently shown to me that a a non-hypertransport
variant of the MCP55 exists, and on those system the register that this quirk
manipulates causes hangs if you write to it. Since the quirk was only meant to
handle errors found on MCP55 chips that have a HT interface, this patch adds a
filter to make sure the chip is an HT capable before making the needed register
adjustment. This lets the broken MCP55s work with kdump while not breaking the
non-HT variants.
Resolves https://bugzilla.kernel.org/show_bug.cgi?id=23952
Tested successfully by the reporter and myself.
Cc: stable@kernel.org
Reported-by: Mathieu BĂ©rard <mathieu@mberard.eu>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
David Daney [Mon, 11 Oct 2010 21:52:45 +0000 (14:52 -0700)]
MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option.
For huge page support with base page size of 16K or 32K, we have to
increase the MAX_ORDER so that huge pages can be allocated.
[Ralf: I don't think a user should have to configure obscure constants like
this but for the time being this will have to suffice.]
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1685/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Maciej W. Rozycki [Sun, 10 Oct 2010 09:42:12 +0000 (10:42 +0100)]
MIPS: LD/SD o32 macro GAS fix update
I am about to commit:
http://sourceware.org/ml/binutils/2010-10/msg00033.html
that fixes a problem with the LD/SD macro currently implemented by GAS for
the o32 ABI in an inconsistent way. This is best illustrated with a
simple program, which I'm copying here from the message above for easier
reference:
$ cat ld.s
ld $5,32767($4)
ld $5,32768($4)
This gets assebled into the following output:
$ mips-linux-as -32 -mips3 -o ld.o ld.s
$ mips-linux-objdump -d ld.o
ld.o: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <.text>:
0:
dc857fff ld a1,32767(a0)
4:
3c010001 lui at,0x1
8:
00810821 addu at,a0,at
c:
8c258000 lw a1,-32768(at)
10:
8c268004 lw a2,-32764(at)
...
Oops!
The GAS fix makes the macro behave in a consistent way and pairs of LW/SW
instructions to be output as appropriate regardless of the size of the
offset associated with the address used. The machine instruction is still
available, but to reach it macros have to be disabled first. This has a
side effect of requiring the use of a machine-addressable memory operand.
As some platforms require 64-bit operations for accesses to some I/O
registers LD/SD instructions are used in a couple of places in Linux
regardless of the ABI selected. Here's a fix for some pieces of code
affected I've been able to track down. The fix should be backwards
compatible with all supported binutils releases in existence and can be
used as a reference for any other places or off-tree code. The use of the
"R" constraint guarantees a machine-addressable operand.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1680/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Manuel Lauss [Mon, 25 Oct 2010 16:44:11 +0000 (18:44 +0200)]
MIPS: Alchemy: fix build with SERIAL_8250=n
In commit
7d172bfe ("Alchemy: Add UART PM methods") I introduced
platform PM methods which call a function of the 8250 driver;
this patch works around link failures when the kernel is built
without 8250 support.
Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
To: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/1737/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 9 Dec 2010 19:14:09 +0000 (19:14 +0000)]
MIPS: Rename mips_dma_cache_sync back to dma_cache_sync
This fixes IP22 and IP28 build errors.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Wed, 1 Dec 2010 17:33:17 +0000 (17:33 +0000)]
MIPS: MT: Fix typo in comment.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sat, 27 Nov 2010 18:26:32 +0000 (19:26 +0100)]
SSB: Fix nvram_get on BCM47xx platform
The nvram_get function was never in the mainline kernel, it only existed in
an external OpenWrt patch. Use nvram_getenv function, which is in mainline
and use an include instead of an extra function declaration. et0macaddr
contains the mac address in text from like 00:11:22:33:44:55. We have to
parse it before adding it into macaddr.
nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through
the MIPS git tree and will be available soon. It will not build now without
nvram_parse_macaddr, but it hasn't before either.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: mb@bu3sch.de
Cc: netdev@vger.kernel.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Michael Buesch <mb@bu3sch.de>
Patchwork: https://patchwork.linux-mips.org/patch/1849/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sat, 27 Nov 2010 16:46:01 +0000 (17:46 +0100)]
MIPS: BCM47xx: Swap serial console if ttyS1 was specified.
Some devices like the Netgear WGT634U are using ttyS1 for default console
output. We should switch to that console if it was given in the kernel_args
parameters.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sat, 27 Nov 2010 16:46:00 +0000 (17:46 +0100)]
MIPS: BCM47xx: Use sscanf for parsing mac address
Instead of writing own function for parsing the mac address we now
use sscanf.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1847/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sat, 27 Nov 2010 16:45:59 +0000 (17:45 +0100)]
MIPS: BCM47xx: Fill values for b43 into SSB sprom
Fill the sprom with all available values from the nvram. Most of these
new values are needed for the b43 or b43legacy driver.
Parts of this patch have been in OpenWRT for a long time and were written
by Michael Buesch.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1846/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hauke Mehrtens [Sat, 27 Nov 2010 16:45:58 +0000 (17:45 +0100)]
MIPS: BCM47xx: Do not read config from CFE
The config options read out here are not stored in CFE but only in NVRAM on
the devices. Remove reading from CFE and only access the NVRAM. Reading out
CFE does not harm but is useless here.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1845/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Thomas Chou [Wed, 24 Nov 2010 07:35:48 +0000 (15:35 +0800)]
MIPS: FDT size is a be32
The totalsize field was be32. And the reserve bootmem would cause failure.
Signed-off-by: Thomas Chou <thomas@wytron.com.tw>
To: devicetree-discuss@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: grant.likely@secretlab.ca
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Dezhong Diao <dediao@cisco.com>
Patchwork: https://patchwork.linux-mips.org/patch/1838/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>