git.openwrt.org Git - openwrt/staging/blogic.git/log

projects / openwrt / staging / blogic.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Vincent Hanquez [Thu, 23 Jun 2005 07:08:45 +0000 (00:08 -0700)]

[PATCH] xen: x86: Use more usermode macro

Use the user_mode macro where it's possible.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Vincent Hanquez [Thu, 23 Jun 2005 07:08:44 +0000 (00:08 -0700)]

[PATCH] xen: x86: Rename usermode macro

Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Vincent Hanquez [Thu, 23 Jun 2005 07:08:43 +0000 (00:08 -0700)]

[PATCH] xen: x86: Use new macro for debugreg

Make use of the 2 new macro set_debugreg and get_debugreg.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Vincent Hanquez [Thu, 23 Jun 2005 07:08:42 +0000 (00:08 -0700)]

[PATCH] xen: x86: add macro for debugreg

Add 2 macros to set and get debugreg on x86. This is useful for Xen because
it will need only to redefine each macro to a hypervisor call.

Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Natalie Protasevich [Thu, 23 Jun 2005 07:08:41 +0000 (00:08 -0700)]

[PATCH] x86_64: avoid wasting IRQs

I suggest to change the way IRQs are handed out to PCI devices.

Currently, each I/O APIC pin gets associated with an IRQ, no matter if the
pin is used or not.  It is expected that each pin can potentually be
engaged by a device inserted into the corresponding PCI slot.  However,
this imposes severe limitation on systems that have designs that employ
many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset.

It is used in ES7000, and currently, there is no way to boot the system
with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

I only implemented this for the ACPI boot, since if the system is this big
and using newer chipsets it is probably (better be!) an ACPI based system
:).  The change is completely "mechanical" and does not alter any internal
structures or interrupt model/implementation.  The patch works for both
i386 and x86_64 archs.  It works with MSIs just fine, and should not
intervene with implementations like shared vectors, when they get worked
out and incorporated.

To illustrate, below is the interrupt distribution for 2-cell ES7000 with
20 I/O APICs, and an Ethernet card in the last slot, which should be eth1
and which was not configured because its IRQ exceeded allowable number (it
actially turned out huge - 480!):

zorro-tb2:~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:      65716      30012      30007      30002      30009      30010      30010      30010    IO-APIC-edge  timer
  4:        373          0        725        280          0          0          0          0    IO-APIC-edge  serial
  8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
14:         39          3          0          0          0          0          0          0    IO-APIC-edge  ide0
16:        108         13          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
18:          0          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb3
19:         15          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
96:       4240        397         18          0          0          0          0          0   IO-APIC-level  aic7xxx
97:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx
192:        847          0          0          0          0          0          0          0   IO-APIC-level  eth0
NMI:          0          0          0          0          0          0          0          0
LOC:     273423     274528     272829     274228     274092     273761     273827     273694
ERR:          7
MIS:          0

Even though the system doesn't have that many devices, some don't get
enabled only because of IRQ numbering model.

This is the IRQ picture after the patch was applied:

zorro-tb2:~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:      44169      10004      10004      10001      10004      10003      10004       6135    IO-APIC-edge  timer
  4:        345          0          0          0          0        244          0          0    IO-APIC-edge  serial
  8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
14:         39          0          3          0          0          0          0          0    IO-APIC-edge  ide0
17:       4425          0          9          0          0          0          0          0   IO-APIC-level  aic7xxx
18:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx, uhci_hcd:usb3
21:        231          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
22:         26          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
24:        348          0          0          0          0          0          0          0   IO-APIC-level  eth0
25:          6        192          0          0          0          0          0          0   IO-APIC-level  eth1
NMI:          0          0          0          0          0          0          0          0
LOC:     107981     107636     108899     108698     108489     108326     108331     108254
ERR:          7
MIS:          0

Not only we see the card in the last I/O APIC, but we are not even close to
using up available IRQs, since we didn't waste any.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Jan Beulich [Thu, 23 Jun 2005 07:08:38 +0000 (00:08 -0700)]

[PATCH] eliminate duplicate rdpmc definition

Eliminate duplicate definition of rdpmc in x86-64's mtrr.h.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Roland McGrath [Thu, 23 Jun 2005 07:08:37 +0000 (00:08 -0700)]

[PATCH] x86_64: never block forced SIGSEGV

This is the x86_64 version of the signal fix I just posted for i386.

This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

john stultz [Thu, 23 Jun 2005 07:08:36 +0000 (00:08 -0700)]

[PATCH] x86_64: fix hpet for systems that don't support legacy replacement

Currently the x86-64 HPET code assumes the entire HPET implementation from
the spec is present.  This breaks on boxes that do not implement the
optional legacy timer replacement functionality portion of the spec.

This patch fixes this issue, allowing x86-64 systems that cannot use the
HPET for the timer interrupt and RTC to still use the HPET as a time
source.  I've tested this patch on a system systems without HPET, with HPET
but without legacy timer replacement, as well as HPET with legacy timer
replacement.

This version adds a minor check to cap the HPET counter value in
gettimeoffset_hpet to avoid possible time inconsistencies.  Please ignore
the A2 version I sent to you earlier.

Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Alexander Nyberg [Thu, 23 Jun 2005 07:08:35 +0000 (00:08 -0700)]

[PATCH] x86_64: i8259.c iso99 structure initialization

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andrew Morton [Thu, 23 Jun 2005 07:08:35 +0000 (00:08 -0700)]

[PATCH] mtrr size-and-base debugging

Consolidate the mtrr sanity checking, add a dump_stack().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andrew Morton [Thu, 23 Jun 2005 07:08:34 +0000 (00:08 -0700)]

[PATCH] x86: cpu_khz type fix

x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Alexey Dobriyan [Thu, 23 Jun 2005 07:08:33 +0000 (00:08 -0700)]

[PATCH] Remove i386_ksyms.c, almost.

* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
removed:

#ifdef CONFIG_FOO
EXPORT_SYMBOL(foo);
#endif

obj-$(CONFIG_FOO) += foo.o

* Tiny reformat to fit in 80 columns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Alexey Dobriyan [Thu, 23 Jun 2005 07:08:32 +0000 (00:08 -0700)]

[PATCH] x86: #include asm/uaccess.h in asm/checksum.h

csum_and_copy_to_user is static inline and uses VERIFY_WRITE. Patch allows
to remove asm/uaccess.h from i386_ksyms.c without dependency surprises.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Aleksey Gorelov [Thu, 23 Jun 2005 07:08:29 +0000 (00:08 -0700)]

[PATCH] VIA 82C586B IRQ routing fix

According to the VIA 82C586B datasheet (still available from
http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a
special PIRQ mapping.

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Natalie Protasevich [Thu, 23 Jun 2005 07:08:29 +0000 (00:08 -0700)]

[PATCH] x86: avoid wasting IRQs for PCI devices

I have submitted the patch for x86_64, this is submission for i386.

The patch changes the way IRQs are handed out to PCI devices.  Currently,
each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
or not.  This imposes severe limitation on systems that have designs that
employ many I/O APICs, only utilizing couple lines of each, such as P64H2
chipset.  It is used in ES7000, and currently, there is no way to boot the
system with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC.  It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 07:08:27 +0000 (00:08 -0700)]

[PATCH] ia64: Selectable Timer Interrupt Frequency

It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 07:08:25 +0000 (00:08 -0700)]

[PATCH] i386: Selectable Frequency of the Timer Interrupt

Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Jan Beulich [Thu, 23 Jun 2005 07:08:24 +0000 (00:08 -0700)]

[PATCH] allow early printk to use more than 25 lines

Allow early printk code to take advantage of the full size of the screen, not
just the first 25 lines.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Jan Beulich [Thu, 23 Jun 2005 07:08:23 +0000 (00:08 -0700)]

[PATCH] adjust i386 watchdog tick calculation

Get the i386 watchdog tick calculation into a state where it can also be used
on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into
a single place (for potential furture adjustments).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Natalie Protasevich [Thu, 23 Jun 2005 07:08:22 +0000 (00:08 -0700)]

[PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386)

This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it. The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.

Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Roland McGrath [Thu, 23 Jun 2005 07:08:21 +0000 (00:08 -0700)]

[PATCH] i386: never block forced SIGSEGV

This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way.  The
signal blocking for sa_mask and the handled signal takes place after the
handler setup.  When the stack is bogus, the handler setup forces a
SIGSEGV.  But then this will be blocked, and returning to user mode will
fault again and iterate.  This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so.  This
copies what was done in the ppc code.  I think all architectures' signal
handler setup code follows this pattern and needs the change.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 07:08:19 +0000 (00:08 -0700)]

[PATCH] NUMA aware block device control structure allocation

Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems). The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.

Does some realignment too.

Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org>
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Pravin Shelar <pravin@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 07:08:18 +0000 (00:08 -0700)]

[PATCH] x86/x86_64: pcibus_to_node

Define pcibus_to_node to be able to figure out which NUMA node contains a
given PCI device. This defines pcibus_to_node(bus) in
include/linux/topology.h and adjusts the macros for i386 and x86_64 that
already provided a way to determine the cpumask of a pci device.

x86_64 was changed to not build an array of cpumasks anymore. Instead an
array of nodes is build which can be used to generate the cpumask via
node_to_cpumask.

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 07:08:17 +0000 (00:08 -0700)]

[PATCH] ppc64: pcibus_to_node fix

asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to
provide the fall back pcibus_to_node function.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Hirokazu Takata [Thu, 23 Jun 2005 07:08:14 +0000 (00:08 -0700)]

[PATCH] m32r: build fix for asm-m32r/topology.h

Use asm-generic/topology.h to fix yet another pcibus_to_node() build error.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Venkatesh Pallipadi [Thu, 23 Jun 2005 07:08:13 +0000 (00:08 -0700)]

[PATCH] Platform SMIs and their interferance with tsc based delay calibration

Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Ian Campbell [Thu, 23 Jun 2005 07:08:10 +0000 (00:08 -0700)]

[PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh

The attached patch causes the various arch specific install.sh scripts to
look for ${CROSS_COMPILE}installkernel rather than just installkernel (in
both /sbin/ and ~/bin/ where the script already did this).  This allows you
to have e.g.  arm-linux-installkernel as a handy way to install on your
cross target.  It also prevents the script picking up on the host
/sbin/installkernel which causes the script to fall through and do the
install itself (which is what I actually use myself, with $INSTALL_PATH
set).

I don't believe it causes back-compatibility problems since calling the
host installkernel was never likely to work or be what you wanted when
cross compiling anyway.  If $CROSS_COMPILE isn't set then nothing changes.

I only use ARM and i386 myself but I figured it couldn't hurt to do the
whole lot.  I've cc'd those who I hope are the arch maintainers for files
that I've touched.

Signed-off-by: Ian Campbell <icampbell@arcom.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

H. Peter Anvin [Thu, 23 Jun 2005 07:08:09 +0000 (00:08 -0700)]

[PATCH] biarch compiler support for i386

This allows the i386 architecture to be built on a system with a biarch
compiler that defaults to x86-64, merely by specifying ARCH=i386.

As previously discussed, this uses the equivalent logic to the ppc port.

Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Martin J. Bligh [Thu, 23 Jun 2005 07:08:08 +0000 (00:08 -0700)]

[PATCH] add page_state info to show_mem

This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:07 +0000 (00:08 -0700)]

[PATCH] add x86-64 specific support for sparsemem

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels. Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

text data bss dec hex filename
2371036 765884 1237108 4374028 42be0c vmlinux.discontig
2366549 776484 1302772 4445805 43d66d vmlinux.sparse

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:06 +0000 (00:08 -0700)]

[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options

In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options. This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:05 +0000 (00:08 -0700)]

[PATCH] add x86-64 Kconfig options for sparsemem

Add the requisite arch specific Kconfig options to enable the use of the
sparsemem implementation for NUMA kernels on x86-64.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Matt Tolentino [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]

[PATCH] remove direct ref to contig_page_data for x86-64

This patch pulls out all remaining direct references to contig_page_data
from arch/x86-64, thus saving an ifdef in one case.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]

[PATCH] ppc64: sparsemem memory model

Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part)
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:02 +0000 (00:08 -0700)]

[PATCH] ppc64: add memory present

Provide hooks for PPC64 to allow memory models to be informed of installed
memory areas. This allows SPARSEMEM to instantiate mem_map for the populated
areas.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:01 +0000 (00:08 -0700)]

[PATCH] ppc64: add early_pfn_to_nid

Provide an implementation of early_pfn_to_nid for PPC64. This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:08:00 +0000 (00:08 -0700)]

[PATCH] sparsemem hotplug base

Make sparse's initalization be accessible at runtime.  This allows sparse
mappings to be created after boot in a hotplug situation.

This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.

The section_mem_map doesn't really store a pointer.  It stores something that
is convenient to do some math against to get a pointer.  It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.

There are a couple of things I'd like to store about a section.  First of all,
the fact that it is !NULL does not mean that it is present.  There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map.  So, I don't think that check is safe.

Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with.  Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there.  We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:59 +0000 (00:07 -0700)]

[PATCH] sparsemem swiss cheese numa layouts

The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

Node 0: +++++     +++++
Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

Node 0: 00000     XXXXX
Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

struct page *start = pfn_to_page(start_pfn);
// effectively start = &node->node_mem_map[0]
for (page = start; page < (start + size); page++) {
init_page_here();...
page++;
}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
page = pfn_to_page(pfn);
}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:57 +0000 (00:07 -0700)]

[PATCH] sparsemem memory model for i386

Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:54 +0000 (00:07 -0700)]

[PATCH] sparsemem memory model

Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:53 +0000 (00:07 -0700)]

[PATCH] generify memory present

Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present(). Provide prototypes for the i386
implementation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Andy Whitcroft [Thu, 23 Jun 2005 07:07:52 +0000 (00:07 -0700)]

[PATCH] generify early_pfn_to_nid

Provide a default implementation for early_pfn_to_nid returning node 0. Allow
architectures to override this with their own implementation out of
asm/mmzone.h.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Mike Kravetz [Thu, 23 Jun 2005 07:07:51 +0000 (00:07 -0700)]

[PATCH] ppc64: Kconfig memory models

This patch changes some of the default behavior in the ppc64 Kconfig file
that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in
preparation for SPARSEMEM. Patch allows the display of both FLAT and
DISCONTIG models on pseries. As before, default is DISCONTIG for SMP and
PSERIES and FLAT for others.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:50 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: give DISCONTIG more help text

This gives DISCONTIGMEM a bit more help text to explain what it does, not just
when to choose it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:49 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: hide "Memory Model" selection menu

I got some feedback from users who think that the new "Memory Model" menu is a
little invasive. This patch will hide that menu, except when
CONFIG_EXPERIMENTAL is enabled *or* when an individual architecture wants it.

An individual arch may want to enable it because they've removed their
arch-specific DISCONTIG prompt in favor of the mm/Kconfig one.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:48 +0000 (00:07 -0700)]

[PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE

This used to be used to disable FLATMEM selection, but I decided to change it
to be done generically when DISCONTIG is enabled. The option is unused, so
this kills it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]

[PATCH] sparsemem: fix minor "defaults" issue in mm/Kconfig

The following patch applies on top of 2.6.12-rc2-mm1. It fixes a minor
user interaction issue, and an early reference to SPARSEMEM.

This "choice" menu would always default to FLATMEM, as it was listed first.
Move it to the end so that the other defaults have a chance first.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]

[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG

There is some confusion that arose when working on SPARSEMEM patch between
what is needed for DISCONTIG vs. NUMA.

Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
All of the current NUMA implementations require an implementation of
DISCONTIG.  Because of this, quite a lot of code which is really needed for
NUMA is actually under DISCONTIG #ifdefs.  For SPARSEMEM, we changed some
of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
case.

Introducing this new NEED_MULTIPLE_NODES config option allows code that is
needed for both NUMA or DISCONTIG to be separated out from code that is
specific to DISCONTIG.

One great advantage of this approach is that it doesn't require every
architecture to be converted over.  All of the current implementations
should "just work", only the ones implementing SPARSEMEM will have to be
fixed up.

The change to free_area_init() makes it work inside, or out of the new
config option.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:45 +0000 (00:07 -0700)]

[PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE

This will at least suppress one prompt that users would have received the
first time they compile with the new DISCONTIG arch option. They'll still
get the "Memory Model" prompt, but 99% of them will have the default work
there.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:43 +0000 (00:07 -0700)]

[PATCH] make each arch use mm/Kconfig

For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu. For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu. The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:42 +0000 (00:07 -0700)]

[PATCH] create mm/Kconfig for arch-independent memory options

With sparsemem being introduced, we need a central place for new
memory-related .config options: mm/Kconfig.  This allows us to remove many
of the duplicated arch-specific options.

The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and
DISCONTIGMEM.  This is a requirement for sparsemem because sparsemem uses
the NUMA code without the presence of DISCONTIGMEM.  The sparsemem patches
use CONFIG_FLATMEM in generic code, so this patch is a requirement before
applying them.

Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use
'#ifdef CONFIG_FLATMEM' instead.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:41 +0000 (00:07 -0700)]

[PATCH] sparsemem base: teach discontig about sparse ranges

discontig.c has some assumptions that mem_map[]s inside of a node are
contiguous. Teach it to make sure that each region that it's bringing online
is actually made up of valid ranges of ram.

Written-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:40 +0000 (00:07 -0700)]

[PATCH] sparsemem base: reorganize page->flags bit operations

Generify the value fields in the page_flags.  The aim is to allow the location
and size of these fields to be varied.  Additionally we want to move away from
fixed allocations per field whilst still enforcing the overall bit utilisation
limits.  We rely on the compiler to spot and optimise the accessor functions.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:39 +0000 (00:07 -0700)]

[PATCH] sparsemem base: simple NUMA remap space allocator

Introduce a simple allocator for the NUMA remap space.  This space is very
scarce, used for structures which are best allocated node local.

This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.

Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
  to allow us to allocate the pgdat's using this mechanism; which we do not yet
  do.  Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:38 +0000 (00:07 -0700)]

[PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)

The following four patches provide the last needed changes before the
introduction of sparsemem.  For a more complete description of what this
will do, please see this patch:

http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

or previous posts on the subject:
http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

Three of these are i386-only, but one of them reorganizes the macros
used to manage the space in page->flags, and will affect all platforms.
There are analogous patches to the i386 ones for ppc64, ia64, and
x86_64, but those will be submitted by the normal arch maintainers.

The combination of the four patches has been test-booted on a variety of
i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
different .configs.  It's also been runtime-tested on ia64 configs (with
more patches on top).

This patch:

We _know_ which node pages in general belong to, at least at a very gross
level in node_{start,end}_pfn[].  Use those to target the allocations of
pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Dave Hansen [Thu, 23 Jun 2005 07:07:37 +0000 (00:07 -0700)]

[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map

This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code.  On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.

There was also a node_mem_map(nid) macro on many architectures.  Its use
along with the use of ->node_mem_map itself was not consistent.  It has
been removed in favor of two new, more explicit, arch-independent macros:

pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)

I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
believe the newer names are much clearer.

These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead.  We could
make this the only behavior if people want, but I don't want to change too
much at once.  One thing at a time.

This patch removes more code than it adds.

Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 16:25:04 +0000 (09:25 -0700)]

Merge 'misc-fixes' branch of /linux/kernel/git/jgarzik/netdev-2.6

commit | commitdiff | tree

Mitch Williams [Thu, 23 Jun 2005 07:41:00 +0000 (03:41 -0400)]

e1000: fix spinlock bug

This patch fixes an obvious and nasty bug where we could exit the transmit
routine while holding tx_lock.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 06:18:10 +0000 (23:18 -0700)]

Merge /pub/scm/linux/kernel/git/gregkh/driver-2.6

commit | commitdiff | tree

Linus Torvalds [Thu, 23 Jun 2005 06:11:50 +0000 (23:11 -0700)]

Merge /pub/scm/linux/kernel/git/davem/net-2.6

commit | commitdiff | tree

Greg Kroah-Hartman [Wed, 22 Jun 2005 23:09:05 +0000 (16:09 -0700)]

[PATCH] driver core: Fix up the device_attach() error handling in bus_add_device()

Don't error out if something "bad" happens when trying to bind a driver to a
device. We want the sysfs attributes to be present for later when we try to
tear down the device.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

Stelian Pop [Wed, 22 Jun 2005 15:53:28 +0000 (17:53 +0200)]

[PATCH] USB: fix hid core to return proper error code from probe

Drivers need to return -ENODEV when they can't bind to a device.
Anything else stops the "bind a device to a driver" search.

From: Stelian Pop <stelian@popies.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

Nishanth Aravamudan [Thu, 23 Jun 2005 05:19:52 +0000 (22:19 -0700)]

[LTPC]: Replace schedule_timeout() with ssleep()/msleep()

Use ssleep() / msleep() [as appropriate]
instead of schedule_timeout() to guarantee the task delays as expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Shaun Pereira [Thu, 23 Jun 2005 05:16:17 +0000 (22:16 -0700)]

[X25]: Fast select with no restriction on response

This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data".  It allows use of the Fast-Select-Acceptance
optional user facility for X.25.

This patch just implements fast select with no restriction on response
(NRR).  What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data.  This patch allows such a response.

The called DTE can also respond with a clear-request packet that contains
call-user-data.  However, this feature is currently not implemented by the
patch.

How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,
After a listen socket in created and bound as follows
socket(AF_X25, SOCK_SEQPACKET, 0);
bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted
packet will be sent back until the following system call is made on the socket
that accepts the call
ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the
application server's call-user-data in the call-accepted packet,
provided the call-request was made with Fast-select NRR.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Shaun Pereira [Thu, 23 Jun 2005 05:15:01 +0000 (22:15 -0700)]

[X25]: Selective sub-address matching with call user data.

From: Shaun Pereira <spereira@tusc.com.au>

This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router).  Details
are as follows.

Current state of module:

A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.

If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own.  This is required when multiple servers listen at the same x25 address
and device interface.  The kernel currently matches ALL call user data, if
present.

Current Changes:

This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match.  By
default no call user data is matched, even if call user data is present.
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched.
Otherwise the kernel behavior is exactly as the original implementation.

This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller

Future Changes on next patch:

There are cases however when call user data may be present in the call
accepted packet.  According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used.  My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used.  I
am currently testing this, again with xot on linux and cisco.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

James Lamanna [Thu, 23 Jun 2005 05:12:57 +0000 (22:12 -0700)]

[EBTABLES]: vfree() checking cleanups

From: jlamanna@gmail.com

ebtables.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nishanth Aravamudan [Thu, 23 Jun 2005 05:11:44 +0000 (22:11 -0700)]

[ATALK] aarp: replace schedule_timeout() with msleep()

From: Nishanth Aravamudan <nacc@us.ibm.com>

Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not wrong, but it does not account for
early return due to signals, so I think msleep() should be appropriate.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Chuck Short [Thu, 23 Jun 2005 05:10:23 +0000 (22:10 -0700)]

[IPV4]: Fix route.c gcc4 warnings

Signed-off by: Chuck Short <zulcss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:05:59 +0000 (22:05 -0700)]

[NETPOLL]: allow multiple netpoll_clients to register against one interface

This patch provides support for registering multiple netpoll clients to the
same network device.  Only one of these clients may register an rx_hook,
however.  In practice, this restriction has not been problematic.  It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.

The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in.  Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.

A lock is introduced to protect the setting and clearing of the np_rx
pointer.  The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:05:31 +0000 (22:05 -0700)]

[NETPOLL]: Introduce a netpoll_info struct

This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll.  The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll;  and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.

The struct netpoll is now pointed to by the netpoll_info structure.  As
such, the previous behaviour of the code is preserved.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jeff Moyer [Thu, 23 Jun 2005 05:04:55 +0000 (22:04 -0700)]

[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock()

This trivial patch moves the assignment of poll_owner to -1 inside of
the lock. This fixes a potential SMP race in the code.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christoph Lameter [Thu, 23 Jun 2005 03:26:07 +0000 (20:26 -0700)]

[PATCH] boot_pageset must not be freed.

The boot_pageset needs to be preserved for hotplugging and for off line
processors and nodes. Otherwise pointers will point into memory that has
now a different use. /proc/zoneinfo is currently showing strange results
if processors / nodes are not present.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 22 Jun 2005 21:51:06 +0000 (14:51 -0700)]

Merge master.kernel.org:/home/rmk/linux-2.6-arm

commit | commitdiff | tree

Eric Dumazet [Wed, 22 Jun 2005 21:32:51 +0000 (14:32 -0700)]

[NET]: dont use strlen() but the result from a prior sprintf()

Small patch to save an unecessary call to strlen() : sprintf() gave us
the length, just trust it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Wed, 22 Jun 2005 21:32:15 +0000 (14:32 -0700)]

Merge client.linux-nfs.org/pub/linux/nfs-2.6

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:47:25 +0000 (21:47 +0100)]

[PATCH] ARM: Remove explicit page-alignments in memory init

Since meminfo.bank[] array contains page-aligned start/size, we
no longer need to explicitly round up/down the addresses when
converting to PFNs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:43:10 +0000 (21:43 +0100)]

[PATCH] ARM: Ensure memory information is page aligned

Ensure that meminfo.bank[] array contains page-aligned start/size
information.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:29:03 +0000 (13:29 -0700)]

[CRYPTO]: Use CPU cycle counters in tcrypt

After using this facility for a while to test my changes to the
cipher crypt() layer, I realised that I should've listend to Dave
and made this thing use CPU cycle counters :) As it is it's too
jittery for me to feel safe about relying on the results.

So here is a patch to make it use CPU cycles by default but fall
back to jiffies if the user specifies a non-zero sec value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:27:51 +0000 (13:27 -0700)]

[CRYPTO]: Use template keys for speed tests if possible

The existing keys used in the speed tests do not pass the 3DES quality check.
This patch makes it use the template keys instead.

Other algorithms can supply template keys through the same interface if needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Harald Welte [Wed, 22 Jun 2005 20:27:23 +0000 (13:27 -0700)]

[CRYPTO]: Add cipher speed tests

From: Reyk Floeter <reyk@vantronix.net>

I recently had the requirement to do some benchmarking on cryptoapi, and
I found reyk's very useful performance test patch [1].

However, I could not find any discussion on why that extension (or
something providing a similar feature but different implementation) was
not merged into mainline. If there was such a discussion, can someone
please point me to the archive[s]?

I've now merged the old patch into 2.6.12-rc1, the result can be found
attached to this email.

[1] http://lists.logix.cz/pipermail/padlock/2004/000010.html

Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:26:36 +0000 (13:26 -0700)]

[CRYPTO]: Kill unnecessary strncpy from tcrypt

It seems that bad code tends to get copied (see test_cipher_speed). So let's
kill this idiom before it spreads any further.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Herbert Xu [Wed, 22 Jun 2005 20:26:03 +0000 (13:26 -0700)]

[CRYPTO]: White space and coding style clean up in tcrypt

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Russell King [Wed, 22 Jun 2005 20:25:58 +0000 (21:25 +0100)]

[PATCH] ARM: Use list_for_each_entry() for dmabounce

Convert dmabounce.c to use list_for_each_entry() instead of
list_for_each() + list_entry().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Kumar Gala [Wed, 22 Jun 2005 20:10:02 +0000 (15:10 -0500)]

[PATCH] ppc32: Fix building MPC8555 CDS

Adding support for MPC8548 w/o PCI support, broke building MPC8555 CDS
by trying to remove a loop variable that was used when PCI is enabled.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org)

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFS: Add debugging code to NFSv4 readdir

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFSv4: Map a couple of NFSv4 errors to EINVAL.

This shows up on running tar over NFSv4.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]

[PATCH] NFSv4: add support for rdattr_error in NFSv4 readdir requests.

Request RDATTR_ERROR as an attribute in readdir to distinguish between a
directory being within an absent filesystem or one (or more) of its entries.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:32 +0000 (17:16 +0000)]

[PATCH] NFSv4: Clean up nfs4 lock state accounting

Ensure that lock owner structures are not released prematurely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NLM: fix a client-side race on blocking locks.

If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NLM: cleanup for blocked locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] VFS: Ensure that all the on-stack struct file_lock call fl_release_private

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]

[PATCH] NFS: Replace nfs_page insertion sort with a radix sort

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]

[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient.

Basically copies the VFS's method for tracking writebacks and applies
it to the struct nfs_page.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]

[PATCH] NFS: Write optimization for short files and small O_SYNC writes.

Use stable writes if we can see that we are only going to put a single
write on the wire.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]

[PATCH] NFS: Ensure that fstat() always returns the correct mtime

Even if the file is open for writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]

[PATCH] NFS: Cleanup of caching code, and slight optimization of writes.

Unless we're doing O_APPEND writes, we really don't care about revalidating
the file length. Just make sure that we catch any page cache invalidations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]

[PATCH] NFS: Fix the file size revalidation

Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.

Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]

[PATCH] NFSv4: Fix up races in nfs4_proc_setattr()

If we do not hold a valid stateid that is open for writes, there is little
point in doing an extra open of the file, as the RFC does not appear to
mandate this...

Make setattr use the correct stateid if we're holding mandatory byte
range locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]

[PATCH] NFSv4: Ensure that propagate NFSv4 state errors to the reclaim code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

commit | commitdiff | tree

Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]

[PATCH] NFS: Clean up readdir changes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

John Crispins staging tree