openwrt/staging/blogic.git
18 years ago[PATCH] sched: filter affine wakeups
akpm@osdl.org [Thu, 12 Jan 2006 09:05:32 +0000 (01:05 -0800)]
[PATCH] sched: filter affine wakeups

\r)

From: Nick Piggin <nickpiggin@yahoo.com.au>

Track the last waker CPU, and only consider wakeup-balancing if there's a
match between current waker CPU and the previous waker CPU.  This ensures
that there is some correlation between two subsequent wakeup events before
we move the task.  Should help random-wakeup workloads on large SMP
systems, by reducing the migration attempts by a factor of nr_cpus.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] scheduler cache-hot-autodetect
akpm@osdl.org [Thu, 12 Jan 2006 09:05:30 +0000 (01:05 -0800)]
[PATCH] scheduler cache-hot-autodetect

\r)

From: Ingo Molnar <mingo@elte.hu>

This is the latest version of the scheduler cache-hot-auto-tune patch.

The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:

- I've added a 'domain distance' function, which is used to cache
  measurement results. Each distance is only measured once. This means
  that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
  distances 0 and 1, and on SMP distance 0 is measured. The code walks
  the domain tree to determine the distance, so it automatically follows
  whatever hierarchy an architecture sets up. This cuts down on the boot
  time significantly and removes the O(N^2) limit. The only assumption
  is that migration costs can be expressed as a function of domain
  distance - this covers the overwhelming majority of existing systems,
  and is a good guess even for more assymetric systems.

  [ People hacking systems that have assymetries that break this
    assumption (e.g. different CPU speeds) should experiment a bit with
    the cpu_distance() function. Adding a ->migration_distance factor to
    the domain structure would be one possible solution - but lets first
    see the problem systems, if they exist at all. Lets not overdesign. ]

Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:

- Instead of relying on a single cache-size provided by the platform and
  sticking to it, the code now auto-detects the 'effective migration
  cost' between two measured CPUs, via iterating through a wide range of
  cachesizes. The code searches for the maximum migration cost, which
  occurs when the working set of the test-workload falls just below the
  'effective cache size'. I.e. real-life optimized search is done for
  the maximum migration cost, between two real CPUs.

  This, amongst other things, has the positive effect hat if e.g. two
  CPUs share a L2/L3 cache, a different (and accurate) migration cost
  will be found than between two CPUs on the same system that dont share
  any caches.

(The reliable measurement of migration costs is tricky - see the source
for details.)

Furthermore i've added various boot-time options to override/tune
migration behavior.

Firstly, there's a blanket override for autodetection:

migration_cost=1000,2000,3000

will override the depth 0/1/2 values with 1msec/2msec/3msec values.

Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:

migration_factor=120

will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.

I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:

Dual Celeron (128K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
 ---------------------
           [00]    [01]
 [00]:     -     1.7(1)
 [01]:   1.7(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 1.7 (1784008)
 ---------------------

Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.

Dual HT P4 (512K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
 ---------------------
           [00]    [01]    [02]    [03]
 [00]:     -     0.4(1)  0.0(0)  0.4(1)
 [01]:   0.4(1)    -     0.4(1)  0.0(0)
 [02]:   0.0(0)  0.4(1)    -     0.4(1)
 [03]:   0.4(1)  0.0(0)  0.4(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (33900) 0.4 (448514)
 ---------------------

Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.

8-way P3/Xeon [2MB L2 cache]:

 ---------------------
 migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
 ---------------------
           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
 [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
 [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
 [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
 [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 19.2 (19281756)
 ---------------------

This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] sched: add cacheflush() asm
Ingo Molnar [Thu, 12 Jan 2006 09:05:27 +0000 (01:05 -0800)]
[PATCH] sched: add cacheflush() asm

Add per-arch sched_cacheflush() which is a write-back cacheflush used by
the migration-cost calibration code at bootup time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Implement ioctl emulation for the parport character device
Andi Kleen [Thu, 12 Jan 2006 09:05:26 +0000 (01:05 -0800)]
[PATCH] Implement ioctl emulation for the parport character device

Fixes bugzilla.kernel.org bug 2903.

Cc: <tim@cyberelk.net>
Cc: <andrea@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] memmap_init_zone(): remove uneccesary page++
Greg Ungerer [Thu, 12 Jan 2006 09:05:24 +0000 (01:05 -0800)]
[PATCH] memmap_init_zone(): remove uneccesary page++

Remove unecessary page++ from memmap_init_zone loop.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: remove slashes from disk names when creation dev names in sysfs
Neil Brown [Thu, 12 Jan 2006 09:05:23 +0000 (01:05 -0800)]
[PATCH] md: remove slashes from disk names when creation dev names in sysfs

e.g. The sx8 driver uses names like sx8/0.

This would make a md component dev name like

   /sys/block/md0/md/dev-sx8/0

which is not allowed.  So we change the '/' to '!' just like
fs/partitions/check.c(register_disk) does.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] do_truncate() call fix in tiny-shmem.c
Catalin Marinas [Thu, 12 Jan 2006 09:05:21 +0000 (01:05 -0800)]
[PATCH] do_truncate() call fix in tiny-shmem.c

Adapt tiny-shmem.c to the new do_truncate() prototype.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] migration: make sure there is no attempt to migrate reserved pages.
Christoph Lameter [Thu, 12 Jan 2006 09:05:20 +0000 (01:05 -0800)]
[PATCH] migration: make sure there is no attempt to migrate reserved pages.

This ensures that reserved pages are not migrated.  Reserved pages
currently cause the WARN_ON to trigger in migrate_page_add()

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix queue stalling while barrier sequencing
Tejun Heo [Thu, 12 Jan 2006 14:39:26 +0000 (15:39 +0100)]
[PATCH] fix queue stalling while barrier sequencing

If ordered tag isn't supported, request ordering for barrier
sequencing is performed by queue draining, which basically hangs the
request queue until elv_completed_request() reports completion of all
previous fs requests.

The condition check in elv_completed_request() was only performed for
fs requests.  If a special request is queued between the last
to-be-drained request and the barrier sequence, draining is never
completed and the queue is stalled forever.

This patch moves the end-of-draining condition check such that it's
performed for all requests.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
Linus Torvalds [Thu, 12 Jan 2006 03:36:32 +0000 (19:36 -0800)]
Merge /pub/scm/linux/kernel/git/mchehab/v4l-dvb

18 years ago[PATCH] x86_64: Fix SMP bootup with CONFIG_KDUMP enabled
Vivek Goyal [Thu, 12 Jan 2006 02:35:20 +0000 (03:35 +0100)]
[PATCH] x86_64: Fix SMP bootup with CONFIG_KDUMP enabled

o This fix was posted for i386 long back. Posting it for x86_64.

  http://marc.theaimsgroup.com/?l=linux-kernel&m=110380103229830&w=2

o This patch fixes the problem of secondary cpus boot up. This situation
  is faced when kernel is built for default locations like 16MB and
  onwards. In this configuration, only primary cpu (BP) comes and
  secondary cpus don't boot.

o Problem occurs because in trampoline code, lgdt is not able to load the
  GDT as it happens to be situated beyond 16MB. This is due to the fact
  that cpu is still in real mode and default operand size is 16bit.

o This patch uses lgdtl instead of lgdt to force operand size to 32
  instead of 16.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Don't confuse noapic with noapictimer
Andi Kleen [Wed, 11 Jan 2006 21:47:10 +0000 (22:47 +0100)]
[PATCH] x86_64: Don't confuse noapic with noapictimer

Handling common prefixes is tricky.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: don't copy command line twice
Jan Beulich [Wed, 11 Jan 2006 21:47:07 +0000 (22:47 +0100)]
[PATCH] x86_64: don't copy command line twice

... reducing the amount of changes Xen has to do.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: make setup_early_printk() usage consistent
Jan Beulich [Wed, 11 Jan 2006 21:47:03 +0000 (22:47 +0100)]
[PATCH] i386/x86-64: make setup_early_printk() usage consistent

The explicit and implicit calls to setup_early_printk() were passing
inconsistent arguments.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: Move DOUBLEFAULT config to arch/i386/Kconfig
Andi Kleen [Wed, 11 Jan 2006 21:47:00 +0000 (22:47 +0100)]
[PATCH] i386: Move DOUBLEFAULT config to arch/i386/Kconfig

It has no business being elsewhere and x86-64 doesn't need/want it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Allow kernel page tables upto the end of memory
Andi Kleen [Wed, 11 Jan 2006 21:46:57 +0000 (22:46 +0100)]
[PATCH] x86_64: Allow kernel page tables upto the end of memory

Previously they would be only allocated before the kernel text at
1MB.  This limited the maximum supported memory to 128GB.
Now allow the e820 allocator to put them everywhere. Try
to put them beyond any DMA zones to avoid filling them up.
This should free some GFP_DMA memory compared to earlier kernels.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Use safe_smp_processor_id in MCE handler
Andi Kleen [Wed, 11 Jan 2006 21:46:54 +0000 (22:46 +0100)]
[PATCH] x86_64: Use safe_smp_processor_id in MCE handler

hard_smp_processor_id would return the local APIC id instead
of the Linux processor id. On big systems they are often
not identical. safe_smp_processor_id is just a wrapper
around it that does the necessary conversions.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Some housekeeping in local APIC code
Andi Kleen [Wed, 11 Jan 2006 21:46:51 +0000 (22:46 +0100)]
[PATCH] x86_64: Some housekeeping in local APIC code

Remove support for obsolete hardware and cleanup.

- Remove checks for non integrated APICs
- Replace apic_write_around with apic_write.
- Remove apic_read_around
- Remove APIC version reads used by old workarounds
- Remove old workaround for Simics
- Fix indentation

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Display meaningful part of filename during BUG()
Jan Beulich [Wed, 11 Jan 2006 21:46:48 +0000 (22:46 +0100)]
[PATCH] x86_64: Display meaningful part of filename during BUG()

When building in a separate objtree, file names produced by BUG() & Co. can
get fairly long; printing only the first 50 characters may thus result in
(almost) no useful information. The following change makes it so that rather
the last 50 characters of the filename get printed.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Reduce screen space needed by stack trace
Jan Beulich [Wed, 11 Jan 2006 21:46:45 +0000 (22:46 +0100)]
[PATCH] x86_64: Reduce screen space needed by stack trace

Especially under Xen, where the console cannot be adjusted to more than 25
lines, it is fairly important that the information displayed during a panic
is as compact as possible. Below adjustments work towards that.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix get_cmos_time()
Jan Beulich [Wed, 11 Jan 2006 21:46:42 +0000 (22:46 +0100)]
[PATCH] x86_64: Fix get_cmos_time()

Due to a broken condition, the body of the loop that is intended to wait for
the Update-In-Progress bit to get set and then cleared again was never
entered; in fact, the entire loop was optimized out by the compiler. Here is
a change to fix the condition (and to also move the initialization of locals
out of the spin lock protected region).

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: No need to export get_cmos_time anymore
Andi Kleen [Wed, 11 Jan 2006 21:46:39 +0000 (22:46 +0100)]
[PATCH] x86_64: No need to export get_cmos_time anymore

It was only needed for APM

Pointed out by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove unused AMD K8 C stepping flag
Andi Kleen [Wed, 11 Jan 2006 21:46:36 +0000 (22:46 +0100)]
[PATCH] x86_64: Remove unused AMD K8 C stepping flag

X86_FEATURE_K8_C was a synthetic Linux CPUID flag that was used for some
code optimizations in Opteron C stepping or later. But support for pre C
stepping optimizations has been removed, so this isn't needed anymore.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: Move phys_proc_id/early intel workaround to correct function.
Andi Kleen [Wed, 11 Jan 2006 21:46:33 +0000 (22:46 +0100)]
[PATCH] i386: Move phys_proc_id/early intel workaround to correct function.

early_cpu_detect only runs on the BP, but this code needs to run
on all CPUs.

Looks like a mismerge somewhere.  Also add a warning comment.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: sparse warning cleanups
Stephen Hemminger [Wed, 11 Jan 2006 21:46:30 +0000 (22:46 +0100)]
[PATCH] x86_64: sparse warning cleanups

Fix some trivial sparse warnings in x86_64 code.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Move NUMA page_to_pfn/pfn_to_page functions out of line
Andi Kleen [Wed, 11 Jan 2006 21:46:27 +0000 (22:46 +0100)]
[PATCH] x86_64: Move NUMA page_to_pfn/pfn_to_page functions out of line

Saves about ~18K .text in defconfig

There would be more optimization potential, but that's for later.

Suggestion originally from Bill Irwin.
Fix from Andy Whitcroft.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove unused segments
Andi Kleen [Wed, 11 Jan 2006 21:46:24 +0000 (22:46 +0100)]
[PATCH] x86_64: Remove unused segments

They used to be used by the reboot code, but not anymore.

Noticed by Jan Beulich

Cc: JBeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: ioapic virtual wire mode fix
Vivek Goyal [Wed, 11 Jan 2006 21:46:21 +0000 (22:46 +0100)]
[PATCH] x86_64: ioapic virtual wire mode fix

o Currently, during kexec reboot, IOAPIC is re-programmed back to virtual
  wire mode if there was an i8259 connected to it. This enables getting
  timer interrupts in second kernel in legacy mode.

o After putting into virtual wire mode, IOAPIC delivers the i8259 interrupts
  to CPU0. This works well for kexec but not for kdump as we might crash
  on a different CPU and second kernel will not see timer interrupts.

o This patch modifies the redirection table entry to deliver the timer
  interrupts to the cpu we are rebooting (instead of hardcoding to zero).
  This ensures that second kernel receives timer interrupts even on a
  non-boot cpu.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_arch
Ravikiran G Thirumalai [Wed, 11 Jan 2006 21:46:18 +0000 (22:46 +0100)]
[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_arch

Introduce vSMP arch to the kernel.

This patch:
1. Adds CONFIG_X86_VSMP
2. Adds machine specific macros for local_irq_disabled, local_irq_enabled
   and irqs_disabled
3. Writes to the vSMP CTL device to indicate kernel compiled with CONFIG_VSMP

Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_align
Ravikiran G Thirumalai [Wed, 11 Jan 2006 21:46:15 +0000 (22:46 +0100)]
[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_align

vSMP specific alignment patch to
1. Define INTERNODE_CACHE_SHIFT for vSMP
2. Use this for alignment of critical structures
3. Use INTERNODE_CACHE_SHIFT for ARCH_MIN_TASKALIGN,
   and let the slab align task_struct allocations to the internode cacheline size
4. Introduce and use ARCH_MIN_MMSTRUCT_ALIGN for mm_struct slab allocations.

Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Make sure BITS_PER_ATOMIC is defined in asm-generic/atomic.h
Andi Kleen [Wed, 11 Jan 2006 21:46:12 +0000 (22:46 +0100)]
[PATCH] x86_64: Make sure BITS_PER_ATOMIC is defined in asm-generic/atomic.h

Fixes

  CC      fs/nfsctl.o
In file included from include2/asm/atomic.h:427,
                 from /home/lsrc/quilt/linux/include/linux/file.h:8,
                 from /home/lsrc/quilt/linux/fs/nfsctl.c:8:
/home/lsrc/quilt/linux/include/asm-generic/atomic.h:20:5: warning: "BITS_PER_LONG" is not defined

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: cleanup enter_lazy_tlb()
Brian Gerst [Wed, 11 Jan 2006 21:46:09 +0000 (22:46 +0100)]
[PATCH] x86_64: cleanup enter_lazy_tlb()

Move the #ifdef into the function body.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Memorize location of i8259 for reboots.
Eric W. Biederman [Wed, 11 Jan 2006 21:46:06 +0000 (22:46 +0100)]
[PATCH] x86_64: Memorize location of i8259 for reboots.

Currently we attempt to restore virtual wire mode on reboot, which only
works if we can figure out where the i8259 is connected.  This is very
useful when we are kexec another kernel and likely helpful to an peculiar
BIOS that make assumptions about how the system is setup.

Since the acpi MADT table does not provide the location where the i8259 is
connected we have to look at the hardware to figure it out.

Most systems have the i8259 connected the local apic of the cpu so won't be
affected but people running Opteron and some serverworks chipsets should be
able to use kexec now.

In addition this patch removes the hard coded assumption that the io_apic
that delivers isa interrups is always known to the kernel as io_apic 0.
There does not appear to be anything to guarantee that assumption is true.

And From: Vivek Goyal <vgoyal@in.ibm.com>

  A minor fix to the patch which remembers the location of where i8259 is
  connected.  Now counter i has been replaced by apic.  counter i is having
  some junk value which was leading to non-detection of i8259 connected to
  IOAPIC.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: allow setting RF in EFLAGS
Chuck Ebbert [Wed, 11 Jan 2006 21:46:03 +0000 (22:46 +0100)]
[PATCH] x86_64: allow setting RF in EFLAGS

Setting RF (resume flag) allows a debugger to resume execution after a code
breakpoint without tripping the breakpoint again.  It is reset by the CPU
after executing one instruction.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: "invalid operand" -> "invalid opcode"
Chuck Ebbert [Wed, 11 Jan 2006 21:46:00 +0000 (22:46 +0100)]
[PATCH] x86_64: "invalid operand" -> "invalid opcode"

The manual says Int 6 is "invalid opcode", not "invalid operand".

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Sparse warnings fix.
Luiz Fernando Capitulino [Wed, 11 Jan 2006 21:45:57 +0000 (22:45 +0100)]
[PATCH] x86_64: Sparse warnings fix.

 Fixes the following sparse warnings:

arch/x86_64/kernel/mce_amd.c:321:29: warning: Using plain integer as NULL pointer
arch/x86_64/kernel/mce_amd.c:410:41: warning: Using plain integer as NULL pointer

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove useless KDB vector
Andi Kleen [Wed, 11 Jan 2006 21:45:54 +0000 (22:45 +0100)]
[PATCH] x86_64: Remove useless KDB vector

It was set as an NMI, but the NMI bit always forces an interrupt
to end up at vector 2. So it was never used. Remove.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Don't claim too many vectors for TLB flushing
Jason Uhlenkott [Wed, 11 Jan 2006 21:45:51 +0000 (22:45 +0100)]
[PATCH] x86_64: Don't claim too many vectors for TLB flushing

It looks like the new scalable TLB flush code for x86_64 is claiming
one more IRQ vector than it actually uses.

Signed-off-by: Jason Uhlenkott <jasonuhl@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Tell user to enable GART_IOMMU when needed
Andi Kleen [Wed, 11 Jan 2006 21:45:48 +0000 (22:45 +0100)]
[PATCH] x86_64: Tell user to enable GART_IOMMU when needed

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix warning in nmi.c on uniprocessor kernels
Andi Kleen [Wed, 11 Jan 2006 21:45:45 +0000 (22:45 +0100)]
[PATCH] x86_64: Fix warning in nmi.c on uniprocessor kernels

Fix

  CC      arch/x86_64/kernel/nmi.o
linux/arch/x86_64/kernel/nmi.c: In function ???check_nmi_watchdog???:
linux/arch/x86_64/kernel/nmi.c:155: warning: statement with no effect

on Uniprocessor builds.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Allocate PDAs in the local node
Ravikiran G Thirumalai [Wed, 11 Jan 2006 21:45:42 +0000 (22:45 +0100)]
[PATCH] x86_64: Allocate PDAs in the local node

Patch uses a static PDA array early at boot and reallocates processor PDA
with node local memory when kmalloc is ready, just before pda_init.
The boot_cpu_pda is needed since the cpu_pda is used even before pda_init for
that cpu is called (to set the static per-cpu areas offset table etc)

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Node local pda take 2 -- cpu_pda preparation
Ravikiran G Thirumalai [Wed, 11 Jan 2006 21:45:39 +0000 (22:45 +0100)]
[PATCH] x86_64: Node local pda take 2 -- cpu_pda preparation

Helper patch to change cpu_pda users to use macros to access cpu_pda
instead of the cpu_pda[] array.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Early initialization of cpu_to_node
Ravikiran Thirumalai [Wed, 11 Jan 2006 21:45:36 +0000 (22:45 +0100)]
[PATCH] x86_64: Early initialization of cpu_to_node

Patch enables early intialization of cpu_to_node.
apicid_to_node is built by reading the SRAT table, from acpi_numa_init with
ACPI_NUMA and k8_scan_nodes with K8_NUMA.
x86_cpu_to_apicid is built by parsing the ACPI MADT table, from acpi_boot_init.
We combine these two tables and setup cpu_to_node.

Early intialization helps the static per_cpu_areas in getting pages from
correct node.

Change since last release:
Do not initialize early init_cpu_to_node for faking node cases.

Patch tested on TYAN dual core 4P board with K8 only, ACPI_NUMA.
Tested on EM64T NUMA. Also tested with numa=off, numa=fake, and  running
a kernel compiled with NUMA on a regular EM64 2 way SMP.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix up white space in time.c
Andi Kleen [Wed, 11 Jan 2006 21:45:33 +0000 (22:45 +0100)]
[PATCH] x86_64: Fix up white space in time.c

No functional changes.

And remove one redundant prototype.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Use standard __always_inline in vsyscall.c
Andi Kleen [Wed, 11 Jan 2006 21:45:30 +0000 (22:45 +0100)]
[PATCH] x86_64: Use standard __always_inline in vsyscall.c

Replacing the old home brewn __force_inline.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: Replace broken serialize_cpu in microcode driver with correct sync_core
Andi Kleen [Wed, 11 Jan 2006 21:45:27 +0000 (22:45 +0100)]
[PATCH] i386: Replace broken serialize_cpu in microcode driver with correct sync_core

Passing random input values in eax to cpuid is not a good idea
because the CPU will GPF for unknown ones.
Use the correct x86-64 version that exists for a longer time too.
This also adds a memory barrier to prevent the optimizer from
reordering.

Cc: tigran@veritas.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: On Intel CPUs don't do an additional CPU sync before RDTSC
Andi Kleen [Wed, 11 Jan 2006 21:45:24 +0000 (22:45 +0100)]
[PATCH] x86_64: On Intel CPUs don't do an additional CPU sync before RDTSC

RDTSC serialization using cpuid is not needed for Intel platforms.
This increases gettimeofday performance.

Cc: vojtech@suse.cz
Cc: rohit.seth@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Support alternative() in vsyscalls
Andi Kleen [Wed, 11 Jan 2006 21:45:21 +0000 (22:45 +0100)]
[PATCH] x86_64: Support alternative() in vsyscalls

The real vsyscall .text addresses are not mapped when the alternative()
replacement runs early, so use some black magic to access them using
the direct mapping.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Support alternative() with a output argument.
Andi Kleen [Wed, 11 Jan 2006 21:45:18 +0000 (22:45 +0100)]
[PATCH] x86_64: Support alternative() with a output argument.

Needed for follow on patches

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Don't try to synchronize the TSC over CPUs on Intel CPUs at boot.
Andi Kleen [Wed, 11 Jan 2006 21:45:15 +0000 (22:45 +0100)]
[PATCH] x86_64: Don't try to synchronize the TSC over CPUs on Intel CPUs at boot.

They already do this in hardware and the Linux algorithm
actually adds errors.

Cc: mingo@elte.hu
Cc: rohit.seth@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix compile error with !CONFIG_COMPAT
Andi Kleen [Wed, 11 Jan 2006 21:45:12 +0000 (22:45 +0100)]
[PATCH] x86_64: Fix compile error with !CONFIG_COMPAT

cpumask.h wasn't included implicitely into proto.h in this case.
Just move it over to smp.h

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: x86_64 write apic id fix
Vivek Goyal [Wed, 11 Jan 2006 21:45:09 +0000 (22:45 +0100)]
[PATCH] x86_64: x86_64 write apic id fix

o Apic id is in most significant 8 bits of APIC_ID register. Current code
  is trying to write apic id to least significant 8 bits. This patch fixes
  it.

o This fix enables booting uni kdump capture kernel on a cpu with non-zero
  apic id.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove duplicate exports
Brian Gerst [Wed, 11 Jan 2006 21:45:06 +0000 (22:45 +0100)]
[PATCH] x86_64: Remove duplicate exports

Remove exports that are already exported from the object's source file.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: unexport pci_*_consistent
Brian Gerst [Wed, 11 Jan 2006 21:45:03 +0000 (22:45 +0100)]
[PATCH] x86_64: unexport pci_*_consistent

These functions are inlines and shouldn't be exported.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove unused apic_write_atomic
Andi Kleen [Wed, 11 Jan 2006 21:45:00 +0000 (22:45 +0100)]
[PATCH] x86_64: Remove unused apic_write_atomic

This function is never used for x86_64.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Make the cpu_*_maps in kernel/sched.c read mostly
Andi Kleen [Wed, 11 Jan 2006 21:44:57 +0000 (22:44 +0100)]
[PATCH] x86_64: Make the cpu_*_maps in kernel/sched.c read mostly

They are referred to often so avoid potential false sharing for them.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: make pci_map_single/pci_map_sg warn for zero length.
Andi Kleen [Wed, 11 Jan 2006 21:44:54 +0000 (22:44 +0100)]
[PATCH] i386: make pci_map_single/pci_map_sg warn for zero length.

As suggested by Linus.

This catches driver bugs that could cause corruption on IOMMU architectures.

Also I converted the BUGs to out_of_line_bug()s to save a bit
of text space.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Enable sound in old style OSS driver for NForce4 CK804
Andi Kleen [Wed, 11 Jan 2006 21:44:51 +0000 (22:44 +0100)]
[PATCH] x86_64: Enable sound in old style OSS driver for NForce4 CK804

Just add the missing PCI ID.

Cc: perex@suse.cz
Cc: tiwai@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Make it clear in machine checks that it's an hardware problem
Andi Kleen [Wed, 11 Jan 2006 21:44:48 +0000 (22:44 +0100)]
[PATCH] x86_64: Make it clear in machine checks that it's an hardware problem

Hopefully the users will take the hint.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Clean up copy_*_user
Andi Kleen [Wed, 11 Jan 2006 21:44:45 +0000 (22:44 +0100)]
[PATCH] x86_64: Clean up copy_*_user

- Remove optimization for old B stepping Opteron
- Make the fast path for copies with a multiple of eight length faster.
- Minor instruction rearrangement to hopefully avoid a pipeline
stall or two.
- Add comment about errata to consider.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Use function pointers to call DMA mapping functions
Muli Ben-Yehuda [Wed, 11 Jan 2006 21:44:42 +0000 (22:44 +0100)]
[PATCH] x86_64: Use function pointers to call DMA mapping functions

AK: I hacked Muli's original patch a lot and there were a lot
of changes - all bugs are probably to blame on me now.
There were also some changes in the fall back behaviour
for swiotlb - in particular it doesn't try to use GFP_DMA
now anymore. Also all DMA mapping operations use the
same core dma_alloc_coherent code with proper fallbacks now.
And various other changes and cleanups.

Known problems: iommu=force swiotlb=force together breaks
                needs more testing.

This patch cleans up x86_64's DMA mapping dispatching code. Right now
we have three possible IOMMU types: AGP GART, swiotlb and nommu, and
in the future we will also have Xen's x86_64 swiotlb and other HW
IOMMUs for x86_64. In order to support all of them cleanly, this
patch:

- introduces a struct dma_mapping_ops with function pointers for each
  of the DMA mapping operations of gart (AMD HW IOMMU), swiotlb
  (software IOMMU) and nommu (no IOMMU).

- gets rid of:

  if (swiotlb)
      return swiotlb_xxx();

- PCI_DMA_BUS_IS_PHYS is now checked against the dma_ops being set
This makes swiotlb faster by avoiding double copying in some cases.

Signed-Off-By: Muli Ben-Yehuda <mulix@mulix.org>
Signed-Off-By: Jon D. Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Reject SRAT tables that don't cover all memory
Andi Kleen [Wed, 11 Jan 2006 21:44:39 +0000 (22:44 +0100)]
[PATCH] x86_64: Reject SRAT tables that don't cover all memory

Broken BIOS on Iwill 8way systems reports these and it causes the bootmem
allocator to crash. Add a sanity check if all the PXMs in the
SRAT table cover all memory as reported by e820. If the sanity
check fails the SRAT is rejected and the code will fall back
to discover the NUMA topology using the K8 northbridge registers
when applicable.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Add idle notifiers
Andi Kleen [Wed, 11 Jan 2006 21:44:36 +0000 (22:44 +0100)]
[PATCH] x86_64: Add idle notifiers

This adds a new notifier chain that is called with IDLE_START
when a CPU goes idle and IDLE_END when it goes out of idle.
The context can be idle thread or interrupt context.

Since we cannot rely on MONITOR/MWAIT existing the idle
end check currently has to be done in all interrupt
handlers.

They were originally inspired by the similar s390 implementation.

They have a variety of applications:
- They will be needed for CONFIG_NO_IDLE_HZ
- They can be used for oprofile to fix up the missing time
in idle when performance counters don't tick.
- They can be used for better C state management in ACPI
- They could be used for microstate accounting.

This is just infrastructure so far, no users.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Clean up some printks in NUMA code
Andi Kleen [Wed, 11 Jan 2006 21:44:33 +0000 (22:44 +0100)]
[PATCH] x86_64: Clean up some printks in NUMA code

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix up coding style in numa.c
Andi Kleen [Wed, 11 Jan 2006 21:44:30 +0000 (22:44 +0100)]
[PATCH] x86_64: Fix up coding style in numa.c

No functional changes

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix off by one in IOMMU check
Andi Kleen [Wed, 11 Jan 2006 21:44:27 +0000 (22:44 +0100)]
[PATCH] x86_64: Fix off by one in IOMMU check

Fix off by one when checking if the machine has enougn memory to need IOMMU
This caused the IOMMUs to be needlessly enabled for mem=4G

Based on a patch from Jon Mason

Signed-off-by: jdmason@us.ibm.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Handle missing local APIC timer interrupts on C3 state
Venkatesh Pallipadi [Wed, 11 Jan 2006 21:44:24 +0000 (22:44 +0100)]
[PATCH] x86_64: Handle missing local APIC timer interrupts on C3 state

Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0).

Patch below adds the code for x86_64.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386: Handle missing local APIC timer interrupts on C3 state
Venkatesh Pallipadi [Wed, 11 Jan 2006 21:44:21 +0000 (22:44 +0100)]
[PATCH] i386: Handle missing local APIC timer interrupts on C3 state

Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0). This is needed because Intel CPUs stop the local
APIC timer in C3.  This is currently only enabled for Intel CPUs.

Patch below adds the code for i386 and also the ACPI hunk.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: Remove sub jiffy profile timer support
Venkatesh Pallipadi [Wed, 11 Jan 2006 21:44:18 +0000 (22:44 +0100)]
[PATCH] i386/x86-64: Remove sub jiffy profile timer support

Remove the finer control of local APIC timer. We cannot provide a sub-jiffy
control like this when we use broadcast from external timer in place of
local APIC. Instead of removing this only on systems that may end up using
broadcast from external timer (due to C3), I am going the
"I'm feeling lucky" way to remove this fully. Basically, I am not sure about
usefulness of this code today. Few other architectures also don't seem to
support this today.

If you are using profiling and fine grained control and don't like this going
away in normal case, yell at me right now.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Report hardware breakpoints in user space when triggered by the kernel
John Blackwood [Wed, 11 Jan 2006 21:44:15 +0000 (22:44 +0100)]
[PATCH] x86_64: Report hardware breakpoints in user space when triggered by the kernel

I would like to throw out a suggestion for a possible change in the way that
the debug register traps are handled in do_debug() when the trap occurs
in kernel-mode.

In the x86_64 version of do_debug(), the code will skip around sending
a SIGTRAP to the current task if the trap occurred while in kernel mode.

On the i386-side of things, if the access happens to occur in kernel mode
(say during a read(2) of user's buffer that matches the address of a
debug register trap), then the do_debug() routine for i386 will go ahead
and call send_sigtrap() and send the SIGTRAP signal.  The send_sigtrap()
code will also set the info.si_addr to NULL in this case (even though I
don't understand why, since the SIGTRAP siginfo processing doesn't use
the si_addr field...).

So I would like to suggest that the x86_64 do_debug() routine also
follow this type of behavior and have it go ahead and send the
SIGTRAP signal to the current task, even if the debug register trap
happens to have occurred in kernel mode.  I have taken a stab at
a patch for this change below.  (It includes the i386-ish change
for setting si_addr to NULL when the trap occurred in kernel mode.)

It seems like a useful feature to be able to 'watch' a user location that
might also be modified in the kernel via a system service call, and have the
debugger report that information back to the user, rather than to just
silently ignore the trap.

Additionally, I realize that users that pull in a kernel debugger such as
KGDB into their kernel might want to remove this change below when they add
in KGDB support.  However, they could alternatively look at the current
task's thread.debugreg[] values to see if the trap occurred due to KGDB
or instead because of a user-space debugger trap, and still honor the
user SIGTRAP processing (instead of the KGDB breakpoint processing)
if the trap matches up with the thread.debugreg[] registers.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: "extern inline" -> "static inline" in pgtable.h
Adrian Bunk [Wed, 11 Jan 2006 21:44:12 +0000 (22:44 +0100)]
[PATCH] x86_64: "extern inline" -> "static inline" in pgtable.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Convert page fault error codes to symbolic constants.
Andi Kleen [Wed, 11 Jan 2006 21:44:09 +0000 (22:44 +0100)]
[PATCH] x86_64: Convert page fault error codes to symbolic constants.

Much better to deal with these than with the magic numbers.

And remove the comment describing the bits - kernel source
is no replacement for an architecture manual.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Implement is_compat_task the right way
Andi Kleen [Wed, 11 Jan 2006 21:44:06 +0000 (22:44 +0100)]
[PATCH] x86_64: Implement is_compat_task the right way

By setting a flag during a 32bit system call only

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Implement compat code for sg driver SG_GET_REQUEST_TABLE ioctl
Andi Kleen [Wed, 11 Jan 2006 21:44:03 +0000 (22:44 +0100)]
[PATCH] x86_64: Implement compat code for sg driver SG_GET_REQUEST_TABLE ioctl

Apparently helps with some non SANE scanner drivers.

Cc: axboe@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove unnecessary case from the page fault handler
Andi Kleen [Wed, 11 Jan 2006 21:44:00 +0000 (22:44 +0100)]
[PATCH] x86_64: Remove unnecessary case from the page fault handler

Don't need to do the vmalloc check for the module range because its
PML4 is shared with the kernel text.

Also removed an unnecessary TLB flush.

Pointed out by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Align and pad x86_64 GDT on page boundary
Ravikiran G Thirumalai [Wed, 11 Jan 2006 21:43:57 +0000 (22:43 +0100)]
[PATCH] x86_64: Align and pad x86_64 GDT on page boundary

This patch is on the same lines as Zachary Amsden's i386 GDT page alignemnt
patch in -mm, but for x86_64.

Patch to align and pad x86_64 GDT on page boundries.

[AK: some minor cleanups and fixed incorrect TLS initialization
in CPU init.]

Signed-off-by: Nippun Goel <nippung@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Allow compilation on a 32bit biarch toolchain
Andi Kleen [Wed, 11 Jan 2006 21:43:54 +0000 (22:43 +0100)]
[PATCH] x86_64: Allow compilation on a 32bit biarch toolchain

This might help on distributions that use a 32bit biarch compiler.

First pass -m64 by default.

Secondly add some more .code32s because at least the Ubuntu biarch
32bit as called by gcc doesn't seem to handle -m64 -m32 as generated
by the Makefile without such assistance.

And finally make sure the linker script can be preprocessed
with a 32bit cpp.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Make udelay more accurate
Ross Biro [Wed, 11 Jan 2006 21:43:51 +0000 (22:43 +0100)]
[PATCH] x86_64: Make udelay more accurate

The attempt to avoid overflow in __delay caused varying precision
on different CPUs depending on differences in the CPU speed.

We should be able to do this multiplication with out overflowing
provided the
cpu is running at less than about 128 GHz.  xloops < 20000 * 0x10c6.
loops_per_jiffy * HZ <= cpu_clock_speed.  So if the cpu clock speed
< 2^64/(20000 * 0x10c6) = 2^64/ 51E6CC0 < 2^64/2^27 = 2^37 = 128G we
will not overflow the calculation.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Return -1 for unknown PCI bus affinity
Andi Kleen [Wed, 11 Jan 2006 21:43:48 +0000 (22:43 +0100)]
[PATCH] x86_64: Return -1 for unknown PCI bus affinity

When we don't know the node a PCI bus is connected to return -1.
This matches the generic code.

Noticed by Ravikiran G Thirumalai <kiran@scalex86.org>

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Handle unknown node (-1) in alloc_pages_node
Andi Kleen [Wed, 11 Jan 2006 21:43:45 +0000 (22:43 +0100)]
[PATCH] x86_64: Handle unknown node (-1) in alloc_pages_node

Following kmalloc_node.

Needed for another patch to return -1 for unknown nodes in x86-64.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: kiran@scalex86.org
Signed-off-by: Andi Kleen <ak@suse.de>
[ Changed 0 to numa_node_id() on suggestion by Christoph Lameter ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Validate SLIT table
Andi Kleen [Wed, 11 Jan 2006 21:43:42 +0000 (22:43 +0100)]
[PATCH] x86_64: Validate SLIT table

A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the
normalized unit). This is actually worse than the default heuristic
because it leads to pci_distance not knowing the difference between
local and remote nodes anymore. This messes up some NUMA
heuristics in generic code.

In this case it's better to fall back to the default heuristic
which just does nodea == nodeb ? 10 : 20.

This patch does some basic sanity checking on the SLIT and only accepts
the SLIT when it passes.

Invariants enforced are:
- Node to itself shall be 10
- Any other distance shouldn't be 10
- Distances smaller than 10 are illegal

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix off by one in acpi table mapping
Andi Kleen [Wed, 11 Jan 2006 21:43:39 +0000 (22:43 +0100)]
[PATCH] x86_64: Fix off by one in acpi table mapping

And fix the test to include the size

Noticed by Vivek Goyal

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Fix 64bit FXSAVE encoding
Jan Beulich [Wed, 11 Jan 2006 21:43:36 +0000 (22:43 +0100)]
[PATCH] x86_64: Fix 64bit FXSAVE encoding

The separation of the rex64 prefix (on fxsave/fxrstor) by way of using
a semicolon resulted in the prefix not always taking effect (because
when extended registers are needed for addressing, another rex prefix
would have been generated by the compiler), thus (depending on the
build) resulting in eventually getting 32-bit saves and/or restores.

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Generalize DMI and enable for x86-64
Andi Kleen [Wed, 11 Jan 2006 21:43:33 +0000 (22:43 +0100)]
[PATCH] x86_64: Generalize DMI and enable for x86-64

Some people need it now on 64bit so reuse the i386 code for
x86-64. This will be also useful for future bug workarounds.

It is a bit simplified there because there is no need
to do it very early on x86-64. This means it doesn't need
early ioremap et.al. We run it as a core initcall right now.

I hope it's not needed for early setup.

I added a general CONFIG_DMI symbol in case IA64 or someone
else wants to reuse the code later too.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Remove bogus file in arch/x86_64/pci
Andi Kleen [Wed, 11 Jan 2006 21:43:30 +0000 (22:43 +0100)]
[PATCH] x86_64: Remove bogus file in arch/x86_64/pci

This was a backup file that somehow made it into the official
tree. Never used for anything. Remove.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Add missing newline in IOMMU error message
Andi Kleen [Wed, 11 Jan 2006 21:43:27 +0000 (22:43 +0100)]
[PATCH] x86_64: Add missing newline in IOMMU error message

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: fls in asm for x86_64
Stephen Hemminger [Wed, 11 Jan 2006 21:43:24 +0000 (22:43 +0100)]
[PATCH] x86_64: fls in asm for x86_64

Use single instruction for find largest set bit on x86_64.

[Updated by Jan Beulich to fix wrong asm constraints in original
patch -AK]

Cc: jbeulich@novell.com
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: fix page fault from show_trace()
Jan Beulich [Wed, 11 Jan 2006 21:43:21 +0000 (22:43 +0100)]
[PATCH] x86_64: fix page fault from show_trace()

The introduction of call_softirq switching to the interrupt stack several
releases earlier resulted in a problem with the code in show_trace, which
assumes that it can pick the previous stack pointer from the end of the
interrupt stack.

Cc: Andi Kleen <ak@muc.de>
Cc: Arjan van de Ven <arjanv@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: fix single step handling for 32bit processes
Peter Beutner [Wed, 11 Jan 2006 21:43:18 +0000 (22:43 +0100)]
[PATCH] x86_64: fix single step handling for 32bit processes

Be more careful with TF handling to fix some copy protection codes in wine

patch originally for i386 by Linus, then ported to x86_64 by Andi Kleen
see: [PATCH] x86_64: Some fixes for single step handling
commit: be61bff789fe44bfb6d9282d8f7eccc860bdcfb6

But it was never applied to the ia32 emulation code which breaks some
copy-protection schemes under wine when running on x86_64.

Signed-off-by: Peter Beutner <p.beutner@gmx.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: don't save eflags in x86-64 switch_to()
Benjamin LaHaise [Wed, 11 Jan 2006 21:43:15 +0000 (22:43 +0100)]
[PATCH] x86_64: don't save eflags in x86-64 switch_to()

As discussed, the flags register on x86-64 is saved and restored by the
assembly code which sets up struct pt_regs, so we do not need to save
and restore it in the inline assembler which already informs gcc that
we're clobbering the flags.  This patch has been sanity booted and works
okay here.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: Don't IPI to offline cpus on shutdown
Eric W. Biederman [Wed, 11 Jan 2006 21:43:12 +0000 (22:43 +0100)]
[PATCH] i386/x86-64: Don't IPI to offline cpus on shutdown

So why are we calling smp_send_stop from machine_halt?

We don't.

Looking more closely at the bug report the problem here
is that halt -p is called which triggers not a halt but
an attempt to power off.

machine_power_off calls machine_shutdown which calls smp_send_stop.

If pm_power_off is set we should never make it out machine_power_off
to the call of do_exit.  So pm_power_off must not be set in this case.
When pm_power_off is not set we expect machine_power_off to devolve
into machine_halt.

So how do we fix this?

Playing too much with smp_send_stop is dangerous because it
must also be safe to be called from panic.

It looks like the obviously correct fix is to only call
machine_shutdown when pm_power_off is defined.  Doing
that will make Andi's assumption about not scheduling
true and generally simplify what must be supported.

This turns machine_power_off into a noop like machine_halt
when pm_power_off is not defined.

If the expected behavior is that sys_reboot(LINUX_REBOOT_CMD_POWER_OFF)
becomes sys_reboot(LINUX_REBOOT_CMD_HALT) if pm_power_off is NULL
this is not quite a comprehensive fix as we pass a different parameter
to the reboot notifier and we set system_state to a different value
before calling device_shutdown().

Unfortunately any fix more comprehensive I can think of is not
obviously correct.  The core problem is that there is no architecture
independent way to detect if machine_power will become a noop, without
calling it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64/i386: Remove preempt disable calls in lowlevel IPI
Zwane Mwaikambo [Wed, 11 Jan 2006 21:43:09 +0000 (22:43 +0100)]
[PATCH] x86_64/i386: Remove preempt disable calls in lowlevel IPI

I noticed that some lowlevel send_IPI_mask helpers had a hotplug/preempt
race whereupon the cpu_online_map was read before disabling preemption;

...
cpumask_t mask = cpu_online_map;
int cpu = get_cpu();
cpu_clear(cpu, mask);
...

But then i realised that there is no need for these lowlevel functions to
be going through all this trouble when all the callers are already made
hotplug/preempt safe.

Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: increase MCE bank counts
Shaohua Li [Wed, 11 Jan 2006 21:43:06 +0000 (22:43 +0100)]
[PATCH] x86_64: increase MCE bank counts

There is one CPU here whose MCE bank count is 6. This patch increases
x86_64's MCE bank count.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: another mb() for smpboot.c
Benjamin LaHaise [Wed, 11 Jan 2006 21:43:03 +0000 (22:43 +0100)]
[PATCH] x86_64: another mb() for smpboot.c

The following is probably a good idea given that the atomic_set() isn't
a barrier here either.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Move int 3 handler to debug stack and allow to increase it.
Jan Beulich [Wed, 11 Jan 2006 21:43:00 +0000 (22:43 +0100)]
[PATCH] x86_64: Move int 3 handler to debug stack and allow to increase it.

This
- switches the INT3 handler to run on an IST stack (to cope with
  breakpoints set by a kernel debugger on places where the kernel's
  %gs base hasn't been set up, yet); the IST stack used is shared with
  the INT1 handler's
[AK: this also allows setting a kprobe on the interrupt/exception entry
points]
- allows nesting of INT1/INT3 handlers so that one can, with a kernel
  debugger, debug (at least) the user-mode portions of the INT1/INT3
  handling; the nesting isn't actively enabled here since a kernel-
  debugger-free kernel doesn't need it

Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Don't confuse apic=... command line option with apic
Andi Kleen [Wed, 11 Jan 2006 21:42:57 +0000 (22:42 +0100)]
[PATCH] x86_64: Don't confuse apic=... command line option with apic

Previously apic was foced with apic=logopt was specified.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Dont't disable early PCI scan with apic
Andi Kleen [Wed, 11 Jan 2006 21:42:54 +0000 (22:42 +0100)]
[PATCH] x86_64: Dont't disable early PCI scan with apic

It might be still needed for non APIC related issues.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: Update AMD CPUID flags
Andi Kleen [Wed, 11 Jan 2006 21:42:51 +0000 (22:42 +0100)]
[PATCH] i386/x86-64: Update AMD CPUID flags

Print bits for RDTSCP, SVM, CR8-LEGACY.

Also now print power flags on i386 like x86-64 always did.
This will add a new line in the 386 cpuinfo, but that shouldn't
be an issue - did that in the past too and I haven't heard
of any breakage.

I shrunk some of the fields in the i386 cpuinfo_x86 to chars
to make up for the new int "x86_power" field. Overall it's
smaller than before.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86_64: Use X86_FEATURE_CONSTANT_TSC now to clean up Intel speedstep drivers
Andi Kleen [Wed, 11 Jan 2006 21:42:48 +0000 (22:42 +0100)]
[PATCH] x86_64: Use X86_FEATURE_CONSTANT_TSC now to clean up Intel speedstep drivers

They previously tried to figure this out on their own.

Suggested by Venkatesh.

Cc: venkatesh.pallipadi@intel.com
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] i386/x86-64: Generalize X86_FEATURE_CONSTANT_TSC flag
Andi Kleen [Wed, 11 Jan 2006 21:42:45 +0000 (22:42 +0100)]
[PATCH] i386/x86-64: Generalize X86_FEATURE_CONSTANT_TSC flag

Define it for i386 too.

This is a synthetic flag that signifies that the CPU's TSC runs
at a constant P state invariant frequency.

Fix up the logic on x86-64/i386 to set it on all known CPUs.
Use the AMD defined bit to set it on future AMD CPUs.

Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>