Zachary Amsden [Sat, 3 Sep 2005 22:55:04 +0000 (15:55 -0700)]
[PATCH] x86: ptep_clear optimization
Add a new accessor for PTEs, which passes the full hint from the mmu_gather
struct; this allows architectures with hardware pagetables to optimize away
atomic PTE operations when destroying an address space. Removing the
locked operation should allow better pipelining of memory access in this
loop. I measured an average savings of 30-35 cycles per zap_pte_range on
the first 500 destructions on Pentium-M, but I believe the optimization
would win more on older processors which still assert the bus lock on xchg
for an exclusive cacheline.
Update: I made some new measurements, and this saves exactly 26 cycles over
ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180
cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
full address space destruction.
pte_clear_full is not yet used, but is provided for future optimizations
(in particular, when running inside of a hypervisor that queues page table
updates, the full hint allows us to avoid queueing unnecessary page table
update for an address space in the process of being destroyed.
This is not a huge win, but it does help a bit, and sets the stage for
further hypervisor optimization of the mm layer on all architectures.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kyle Moffett [Sat, 3 Sep 2005 22:55:03 +0000 (15:55 -0700)]
[PATCH] sab: consolidate kmem_bufctl_t
This is used only in slab.c and each architecture gets to define whcih
underlying type is to be used.
Seems a bit silly - move it to slab.c and use the same type for all
architectures: unsigned int.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chen, Kenneth W [Sat, 3 Sep 2005 22:55:02 +0000 (15:55 -0700)]
[PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc()
I don't think we need to call hugetlb_clean_stale_pgtable() anymore
in 2.6.13 because of the rework with free_pgtables(). It now collect
all the pte page at the time of munmap. It used to only collect page
table pages when entire one pgd can be freed and left with staled pte
pages. Not anymore with 2.6.13. This function will never be called
and We should turn it into a BUG_ON.
I also spotted two problems here, not Adam's fault :-)
(1) in huge_pte_alloc(), it looks like a bug to me that pud is not
checked before calling pmd_alloc()
(2) in hugetlb_clean_stale_pgtable(), it also missed a call to
pmd_free_tlb. I think a tlb flush is required to flush the mapping
for the page table itself when we clear out the pmd pointing to a
pte page. However, since hugetlb_clean_stale_pgtable() is never
called, so it won't trigger the bug.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adam Litke [Sat, 3 Sep 2005 22:55:01 +0000 (15:55 -0700)]
[PATCH] hugetlb: check p?d_present in huge_pte_offset()
For demand faulting, we cannot assume that the page tables will be
populated. Do what the rest of the architectures do and test p?d_present()
while walking down the page table.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adam Litke [Sat, 3 Sep 2005 22:55:00 +0000 (15:55 -0700)]
[PATCH] hugetlb: move stale pte check into huge_pte_alloc()
Initial Post (Wed, 17 Aug 2005)
This patch moves the
if (! pte_none(*pte))
hugetlb_clean_stale_pgtable(pte);
logic into huge_pte_alloc() so all of its callers can be immune to the bug
described by Kenneth Chen at http://lkml.org/lkml/2004/6/16/246
> It turns out there is a bug in hugetlb_prefault(): with 3 level page table,
> huge_pte_alloc() might return a pmd that points to a PTE page. It happens
> if the virtual address for hugetlb mmap is recycled from previously used
> normal page mmap. free_pgtables() might not scrub the pmd entry on
> munmap and hugetlb_prefault skips on any pmd presence regardless what type
> it is.
Unless I am missing something, it seems more correct to place the check inside
huge_pte_alloc() to prevent a the same bug wherever a huge pte is allocated.
It also allows checking for this condition when lazily faulting huge pages
later in the series.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adam Litke [Sat, 3 Sep 2005 22:54:59 +0000 (15:54 -0700)]
[PATCH] hugetlb: add pte_huge() macro
This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
patch later in the series. Instead of repeating (_PAGE_PRESENT |
_PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: <linux-mm@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Deepak Saxena [Sat, 3 Sep 2005 22:54:58 +0000 (15:54 -0700)]
[PATCH] arm: allow for arch-specific IOREMAP_MAX_ORDER
Version 6 of the ARM architecture introduces the concept of 16MB pages
(supersections) and 36-bit (40-bit actually, but nobody uses this) physical
addresses. 36-bit addressed memory and I/O and ARMv6 can only be mapped
using supersections and the requirement on these is that both virtual and
physical addresses be 16MB aligned. In trying to add support for ioremap()
of 36-bit I/O, we run into the issue that get_vm_area() allows for a
maximum of 512K alignment via the IOREMAP_MAX_ORDER constant. To work
around this, we can:
- Allocate a larger VM area than needed (size + (1ul << IOREMAP_MAX_ORDER))
and then align the pointer ourselves, but this ends up with 512K of
wasted VM per ioremap().
- Provide a new __get_vm_area_aligned() API and make __get_vm_area() sit
on top of this. I did this and it works but I don't like the idea
adding another VM API just for this one case.
- My preferred solution which is to allow the architecture to override
the IOREMAP_MAX_ORDER constant with it's own version.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paolo 'Blaisorblade' Giarrusso [Sat, 3 Sep 2005 22:54:57 +0000 (15:54 -0700)]
[PATCH] mm: correct _PAGE_FILE comment
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is
set just for non-linear PTE's. Correct the comment for i386, x86_64, UML.
Also clearify _PAGE_NONE.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paolo 'Blaisorblade' Giarrusso [Sat, 3 Sep 2005 22:54:56 +0000 (15:54 -0700)]
[PATCH] mm: remove implied vm_ops check
If !vma->vm-ops we already BUG above, so retesting it is useless. The
compiler cannot optimize this because BUG is a macro and is not thus marked
noreturn; that should possibly be fixed.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paolo 'Blaisorblade' Giarrusso [Sat, 3 Sep 2005 22:54:55 +0000 (15:54 -0700)]
[PATCH] shmem_populate: avoid an useless check, and some comments
Either shmem_getpage returns a failure, or it found a page, or it was told
it couldn't do any I/O. So it's useless to check nonblock in the else
branch. We could add a BUG() there but I preferred to comment the
offending function.
This was taken out from one Ingo Molnar's old patch I'm resurrecting.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Martin Hicks [Sat, 3 Sep 2005 22:54:54 +0000 (15:54 -0700)]
[PATCH] vm: slab.c spelling correction
Fix a small spelling mistake. subtile->subtle
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paolo 'Blaisorblade' Giarrusso [Sat, 3 Sep 2005 22:54:53 +0000 (15:54 -0700)]
[PATCH] comment typo fix
smp_entry_t -> swap_entry_t
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:53 +0000 (15:54 -0700)]
[PATCH] mm: fix madvise vma merging
Better late than never, I've at last reviewed the madvise vma merging
going into 2.6.13. Remove a pointless check and fix two little bugs -
a simple test (with /proc/<pid>/maps hacked to show ReadHints) showed
both mismerges in practice: though being madvise, neither was disastrous.
1. Correct placement of the success label in madvise_behavior: as in
mprotect_fixup and mlock_fixup, it is necessary to update vm_flags
when vma_merge succeeds (to handle the exceptional Case 8 noted in
the comments above vma_merge itself).
2. Correct initial value of prev when starting part way into a vma: as
in sys_mprotect and do_mlock, it needs to be set to vma in this case
(vma_merge handles only that minimum of cases shown in its comments).
3. If find_vma_prev sets prev, then the vma it returns is prev->vm_next,
so it's pointless to make that same assignment again in sys_madvise.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Martin Hicks [Sat, 3 Sep 2005 22:54:51 +0000 (15:54 -0700)]
[PATCH] VM: zone reclaim atomic ops cleanup
Christoph Lameter and Marcelo Tosatti asked to get rid of the
atomic_inc_and_test() to cleanup the atomic ops in the zone reclaim code.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Martin Hicks [Sat, 3 Sep 2005 22:54:50 +0000 (15:54 -0700)]
[PATCH] VM: add capabilites check to set_zone_reclaim
Add a capability check to sys_set_zone_reclaim(). This syscall is not
something that should be available to a user.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sat, 3 Sep 2005 22:54:50 +0000 (15:54 -0700)]
[PATCH] mm: remove atomic
This bitop does not need to be atomic because it is performed when there will
be no references to the page (ie. the page is being freed).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sat, 3 Sep 2005 22:54:49 +0000 (15:54 -0700)]
[PATCH] mm: remap ZERO_PAGE mappings
filemap_xip's nopage routine maps the ZERO_PAGE into readonly mappings, if it
has no data page to map there: then if the hole in the file is later filled,
__xip_unmap uses an rmap technique to replace the ZERO_PAGEs mapped for that
offset by the newly allocated file page, so that established mappings will see
the newly written data.
However, on MIPS (alone) there's not one but as many as eight ZERO_PAGEs,
chosen for coloring by user virtual address; and if mremap has meanwhile been
used to move a mapping containing a ZERO_PAGE, it will generally not match the
ZERO_PAGE(address) __xip_unmap is looking for.
To maintain XIP's established mappings correctly on MIPS, we need Nick's fix
to mremap's move_one_page (originally presented as an optimization), to
replace the ZERO_PAGE appropriate to the old address by the ZERO_PAGE
appropriate to the new address.
(But when I first saw this, I was thinking the ZERO_PAGEs themselves would get
corrupted, very bad. Now I think it's the other way round, that the
established mappings will fail to see the newly written data: incorrect, but
not corrupting everything else. Whether filemap_xip's technique is generally
safe, I'd hesitate to say in a hurry: it's interesting, but we've never tried
to do that in tmpfs.)
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sat, 3 Sep 2005 22:54:48 +0000 (15:54 -0700)]
[PATCH] mm: cleanup rmap
Thanks to Bill Irwin for pointing this out.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sat, 3 Sep 2005 22:54:47 +0000 (15:54 -0700)]
[PATCH] mm: micro-optimise rmap
Microoptimise page_add_anon_rmap. Although these expressions are used only in
the taken branch of the if() statement, the compiler can't reorder them inside
because atomic_inc_and_test is a barrier.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sat, 3 Sep 2005 22:54:46 +0000 (15:54 -0700)]
[PATCH] mm: comment rmap
Just be clear that VM_RESERVED pages here are a bug, and the test is not there
because they are expected.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Sat, 3 Sep 2005 22:54:45 +0000 (15:54 -0700)]
[PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside
This patch was recently discussed on linux-mm:
http://marc.theaimsgroup.com/?t=
112085728500002&r=1&w=2
I inherited a large code base from Ray for page migration. There was a
small patch in there that I find to be very useful since it allows the
display of the locality of the pages in use by a process. I reworked that
patch and came up with a /proc/<pid>/numa_maps that gives more information
about the vma's of a process. numa_maps is indexes by the start address
found in /proc/<pid>/maps. F.e. with this patch you can see the page use
of the "getty" process:
margin:/proc/12008 # cat maps
00000000-
00004000 r--p
00000000 00:00 0
2000000000000000-
200000000002c000 r-xp
00000000 08:04 516 /lib/ld-2.3.3.so
2000000000038000-
2000000000040000 rw-p
00028000 08:04 516 /lib/ld-2.3.3.so
2000000000040000-
2000000000044000 rw-p
2000000000040000 00:00 0
2000000000058000-
2000000000260000 r-xp
00000000 08:04
54707842 /lib/tls/libc.so.6.1
2000000000260000-
2000000000268000 ---p
00208000 08:04
54707842 /lib/tls/libc.so.6.1
2000000000268000-
2000000000274000 rw-p
00200000 08:04
54707842 /lib/tls/libc.so.6.1
2000000000274000-
2000000000280000 rw-p
2000000000274000 00:00 0
2000000000280000-
20000000002b4000 r--p
00000000 08:04
9126923 /usr/lib/locale/en_US.utf8/LC_CTYPE
2000000000300000-
2000000000308000 r--s
00000000 08:04
60071467 /usr/lib/gconv/gconv-modules.cache
2000000000318000-
2000000000328000 rw-p
2000000000318000 00:00 0
4000000000000000-
4000000000008000 r-xp
00000000 08:04
29576399 /sbin/mingetty
6000000000004000-
6000000000008000 rw-p
00004000 08:04
29576399 /sbin/mingetty
6000000000008000-
600000000002c000 rw-p
6000000000008000 00:00 0 [heap]
60000fff7fffc000-
60000fff80000000 rw-p
60000fff7fffc000 00:00 0
60000ffffff44000-
60000ffffff98000 rw-p
60000ffffff44000 00:00 0 [stack]
a000000000000000-
a000000000020000 ---p
00000000 00:00 0 [vdso]
cat numa_maps
2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
getty uses ld.so. The first vma is the code segment which is used by 43
other processes and the pages are evenly distributed over the 4 nodes.
The second vma is the process specific data portion for ld.so. This is
only one page.
The display format is:
<startaddress> Links to information in /proc/<pid>/map
<memory policy> This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
MaxRef= <maximum reference to a page in this vma>
Pages= <Nr of pages in use>
Mapped= <Nr of pages with mapcount >
Anon= <nr of anonymous pages>
Nx= <Nr of pages on Node x>
The content of the proc-file is self-evident. If this would be tied into
the sparsemem system then the contents of this file would not be too
useful.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:43 +0000 (15:54 -0700)]
[PATCH] rmap: don't test rss
Remove the three get_mm_counter(mm, rss) tests from rmap.c: there was a
time when testing rss was important to avoid a particular race between
dup_mmap and the anonmm rmap; but now it's just a rather silly pseudo-
optimization, made even more obscure by the get_mm_counter macro.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:43 +0000 (15:54 -0700)]
[PATCH] delete from_swap_cache BUG_ONs
Three of the four BUG_ONs in delete_from_swap_cache are immediately
repeated in __delete_from_swap_cache: delete those and add the one. But
perhaps mm/ is altogether overprovisioned with historic BUGs?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:42 +0000 (15:54 -0700)]
[PATCH] swap: update swsusp use of swap_info
Aha, swsusp dips into swap_info[], better update it to swap_lock. It's
bitflipping flags with 0xFF, so get_swap_page will allocate from only the one
chosen device: let's change that to flip SWP_WRITEOK.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:41 +0000 (15:54 -0700)]
[PATCH] swap: swap_lock replace list+device
The idea of a swap_device_lock per device, and a swap_list_lock over them all,
is appealing; but in practice almost every holder of swap_device_lock must
already hold swap_list_lock, which defeats the purpose of the split.
The only exceptions have been swap_duplicate, valid_swaphandles and an
untrodden path in try_to_unuse (plus a few places added in this series).
valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
demand attention. However, with the hold time in get_swap_pages so much
reduced, I've not yet found a load and set of swap device priorities to show
even swap_duplicate benefitting from the split. Certainly the split is mere
overhead in the common case of a single swap device.
So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
(generally we seem to prefer an _ in the name, and not hide in a macro).
If someone can show a regression in swap_duplicate, then probably we should
add a hashlock for the swap_map entries alone (shorts being anatomic), so as
to help the case of the single swap device too.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:40 +0000 (15:54 -0700)]
[PATCH] swap: scan_swap_map latency breaks
The get_swap_page/scan_swap_map latency can be so bad that even those without
preemption configured deserve relief: periodically cond_resched.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:39 +0000 (15:54 -0700)]
[PATCH] swap: scan_swap_map drop swap_device_lock
get_swap_page has often shown up on latency traces, doing lengthy scans while
holding two spinlocks. swap_list_lock is already dropped, now scan_swap_map
drop swap_device_lock before scanning the swap_map.
While scanning for an empty cluster, don't worry that racing tasks may
allocate what was free and free what was allocated; but when allocating an
entry, check it's still free after retaking the lock. Avoid dropping the lock
in the expected common path. No barriers beyond the locks, just let the
cookie crumble; highest_bit limit is volatile, but benign.
Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
fall - just use schedule_timeout, we don't want to burden scan_swap_map
itself, and it's very unlikely that anyone can really still be in
scan_swap_map once swapoff gets this far.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:38 +0000 (15:54 -0700)]
[PATCH] swap: scan_swap_map restyled
Rewrite scan_swap_map to allocate in just the same way as before (taking the
next free entry SWAPFILE_CLUSTER-1 times, then restarting at the lowest wholly
empty cluster, falling back to lowest entry if none), but with a view towards
dropping the lock in the next patch.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:37 +0000 (15:54 -0700)]
[PATCH] swap: get_swap_page drop swap_list_lock
Rewrite get_swap_page to allocate in just the same sequence as before, but
without holding swap_list_lock across its scan_swap_map. Decrement
nr_swap_pages and update swap_list.next in advance, while still holding
swap_list_lock. Skip full devices by testing highest_bit. Swapoff hold
swap_device_lock as well as swap_list_lock to clear SWP_WRITEOK. Reduces lock
contention when there are parallel swap devices of the same priority.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:36 +0000 (15:54 -0700)]
[PATCH] swap: freeing update swap_list.next
This makes negligible difference in practice: but swap_list.next should not be
updated to a higher prio in the general helper swap_info_get, but rather in
swap_entry_free; and then only in the case when entry is actually freed.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:35 +0000 (15:54 -0700)]
[PATCH] swap: swap unsigned int consistency
The swap header's unsigned int last_page determines the range of swap pages,
but swap_info has been using int or unsigned long in some cases: use unsigned
int throughout (except, in several places a local unsigned long is useful to
avoid overflows when adding).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:34 +0000 (15:54 -0700)]
[PATCH] swap: show span of swap extents
The "Adding %dk swap" message shows the number of swap extents, as a guide to
how fragmented the swapfile may be. But a useful further guide is what total
extent they span across (sometimes scarily large).
And there's no need to keep nr_extents in swap_info: it's unused after the
initial message, so save a little space by keeping it on stack.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:34 +0000 (15:54 -0700)]
[PATCH] swap: swap extent list is ordered
There are several comments that swap's extent_list.prev points to the lowest
extent: that's not so, it's extent_list.next which points to it, as you'd
expect. And a couple of loops in add_swap_extent which go all the way through
the list, when they should just add to the other end.
Fix those up, and let map_swap_page search the list forwards: profiles shows
it to be twice as quick that way - because prefetch works better on how the
structs are typically kmalloc'ed? or because usually more is written to than
read from swap, and swap is allocated ascendingly?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:33 +0000 (15:54 -0700)]
[PATCH] swap: move destroy_swap_extents calls
sys_swapon's call to destroy_swap_extents on failure is made after the final
swap_list_unlock, which is faintly unsafe: another sys_swapon might already be
setting up that swap_info_struct. Calling it earlier, before taking
swap_list_lock, is safe. sys_swapoff's call to destroy_swap_extents was safe,
but likewise move it earlier, before taking the locks (once try_to_unuse has
completed, nothing can be needing the swap extents).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:32 +0000 (15:54 -0700)]
[PATCH] swap: correct swapfile nr_good_pages
If a regular swapfile lies on a filesystem whose blocksize is less than
PAGE_SIZE, then setup_swap_extents may have to cut the number of usable swap
pages; but sys_swapon's nr_good_pages was not expecting that. Also,
setup_swap_extents takes no account of badpages listed in the swap header: not
worth doing so, but ensure nr_badpages is 0 for a regular swapfile.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Sat, 3 Sep 2005 22:54:31 +0000 (15:54 -0700)]
[PATCH] swap: update swapfile i_sem comment
Update swap extents comment: nowadays we guard with S_SWAPFILE not i_sem.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Rothwell [Sat, 3 Sep 2005 22:54:30 +0000 (15:54 -0700)]
[PATCH] mm: consolidate get_order
Someone mentioned that almost all the architectures used basically the same
implementation of get_order. This patch consolidates them into
asm-generic/page.h and includes that in the appropriate places. The
exceptions are ia64 and ppc which have their own (presumably optimised)
versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Hansen [Sat, 3 Sep 2005 22:54:29 +0000 (15:54 -0700)]
[PATCH] sparsemem extreme: hotplug preparation
This splits up sparse_index_alloc() into two pieces. This is needed
because we'll allocate the memory for the second level in a different place
from where we actually consume it to keep the allocation from happening
underneath a lock
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bob Picco [Sat, 3 Sep 2005 22:54:28 +0000 (15:54 -0700)]
[PATCH] sparsemem extreme implementation
With cleanups from Dave Hansen <haveblue@us.ibm.com>
SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
mem_sections. This two level layout scheme is able to achieve smaller
memory requirements for SPARSEMEM with the tradeoff of an additional shift
and load when fetching the memory section. The current SPARSEMEM
implementation is a one dimensional array of mem_sections which is the
default SPARSEMEM configuration. The patch attempts isolates the
implementation details of the physical layout of the sparsemem section
array.
SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
memory_present() calls. This is not always feasible, so architectures
which do not need it may allocate everything statically by using
SPARSEMEM_STATIC.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bob Picco [Sat, 3 Sep 2005 22:54:26 +0000 (15:54 -0700)]
[PATCH] SPARSEMEM EXTREME
A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME. Architecture
platforms with a very sparse physical address space would likely want to
select this option. For those architecture platforms that don't select the
option, the code generated is equivalent to SPARSEMEM currently in -mm.
I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.
ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
pointers to mem_sections. This two level layout scheme is able to achieve
smaller memory requirements for SPARSEMEM with the tradeoff of an
additional shift and load when fetching the memory section. The current
SPARSEMEM -mm implementation is a one dimensional array of mem_sections
which is the default SPARSEMEM configuration. The patch attempts isolates
the implementation details of the physical layout of the sparsemem section
array.
ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.
I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
and tested with aim.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Mackall [Sat, 3 Sep 2005 22:54:25 +0000 (15:54 -0700)]
[PATCH] kbuild: fix make clean damaging hg repos
Running 'make clean' was quietly deleting files in Mercurial kernel
repositories matching '.*.d', which was corrupting the tags portions of the
repository. Spotted and fixed by several people.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Marcel Selhorst [Sat, 3 Sep 2005 22:54:20 +0000 (15:54 -0700)]
[PATCH] tpm_infineon: Bugfix in PNPACPI-handling
This patch corrects the PNP-handling inside the tpm-driver
and some minor coding style bugs.
Note: the pci-device and pnp-device mixture is currently necessary,
since the used "tpm"-interface requires a pci-dev in order to register
the driver. This will be fixed within the next iterations.
Signed-off-by: Marcel Selhorst <selhorst@crypto.rub.de>
Cc: Kylene Hall <kjhall@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Krufky [Sat, 3 Sep 2005 22:54:18 +0000 (15:54 -0700)]
[PATCH] dvb: saa7134-dvb must select tda1004x
Please apply this to 2.6.14, and also to 2.6.13.1 -stable. Without this
patch, users will have to EXPLICITLY select tda1004x in Kconfig. This
SHOULD be done automatically when saa7134-dvb is selected. This patch
corrects this problem.
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 2 Sep 2005 09:01:35 +0000 (02:01 -0700)]
Merge refs/heads/ieee80211-wifi from /linux/kernel/git/jgarzik/netdev-2.6
Jeff Garzik [Fri, 2 Sep 2005 08:44:25 +0000 (04:44 -0400)]
[wireless hostap] automatically select ieee80211 dependency in Kconfig
Rolf Eike Beer [Fri, 2 Sep 2005 07:03:09 +0000 (09:03 +0200)]
[PATCH] remove driverfs references from init/do_mounts.c
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in this file.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rolf Eike Beer [Fri, 2 Sep 2005 06:59:25 +0000 (08:59 +0200)]
[PATCH] remove driverfs references from include/linux/cpu.h and net/sunrpc/rpc_pipe.c
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in these two files.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Miles Bader [Fri, 2 Sep 2005 06:13:31 +0000 (15:13 +0900)]
[PATCH] v850: Add show_mem
Signed-off-by: Miles Bader <miles@gnu.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Miles Bader [Fri, 2 Sep 2005 06:13:30 +0000 (15:13 +0900)]
[PATCH] v850: Update defconfigs
Signed-off-by: Miles Bader <miles@gnu.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Miles Bader [Fri, 2 Sep 2005 06:13:30 +0000 (15:13 +0900)]
[PATCH] v850: Round up length passed to slram driver to a multiple of SLRAM_BLK_SZ
Signed-off-by: Miles Bader <miles@gnu.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] uclinux: use MAP_PRIVATE when mmaping code regions in flat binary loader
Use MAP_PRIVATE when calling mmap to get memory for the code region.
The flat loader was using MAP_SHARED, but underlying changes to the
MMUless mmap means this is now wrong.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: update defconfig for m68knommu
Updated defconfig for m68knommu arch.
Patch originaly submitted by Jan Dittmer <jdittmer@ppp0.net>
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: new board support in linker script
. add support for the M5235EVB board
. add support for the SOM5282 board
. add support for the MOD5272 board
. fix end of memory define for eLITE board
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: need pfn_valid macro
Need pfn_valid macro, even on MMUless platforms.
Enclose the macro args of __pa and __va in parentheses.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: use THREAD_SIZE instead of hard coded size
Use the THREAD_SIZE define when manipulating the stack instead of
hard coded values (for the 68328 and 68360 sub-architectures).
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: new family (523x) and board config support
New architecture and board configuration support for m68knommu.
. add 523x ColdFire support
. add support for SOM5282 and MOD5272 boards
. break up the 527x to be separate 5271 and 5275. There is some
subtle differences that (like RAM config) that need to be dealt with
. add option to support selecting 4k kernel stack
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: 523x ColdFire processor support in arch Makefile
Add support for the 523x ColdFire family of processors
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: cleanup showstack()
Make show_stack() consistent with other architectures.
Put the vector string names in the .rodata section.
Patch originally submitted by Philippe De Muyter <phdm@macqel.be>.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] uclinux: update MAINTAINERS entry for UCLINUX
Modify maintainers for uClinux (MMUless). Neither Dave nor Jeff
manitain the 2.6 code in mainline, so no point emailing them about
problems.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: fix ColdFire startup code to properly handle non 0 based ram
Correctly determine the end of ram for ram setups that do not
start at base address of 0. Add support for the MOD5272 board,
which doesn not have a ram base of 0.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: new family (523x) and board setup
. setup for the new 523x ColdFire family
. break up of 527x to be 5271 and 5275
. some white space cleanup
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Greg Ungerer [Fri, 2 Sep 2005 00:42:52 +0000 (10:42 +1000)]
[PATCH] m68knommu: 523x ColdFire processor init code
Low level initialization code for the 523x ColdFire processor family.
Signed-off-by: Greg Ungerer <gerg@uclinux.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Fri, 2 Sep 2005 07:53:36 +0000 (00:53 -0700)]
Merge HEAD from /home/rmk/linux-2.6-serial
Linus Torvalds [Fri, 2 Sep 2005 07:52:05 +0000 (00:52 -0700)]
Merge HEAD from kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Fri, 2 Sep 2005 07:48:33 +0000 (00:48 -0700)]
Merge refs/heads/ieee80211-wifi from /linux/kernel/git/jgarzik/netdev-2.6
Linus Torvalds [Fri, 2 Sep 2005 07:46:53 +0000 (00:46 -0700)]
Merge refs/heads/upstream from /linux/kernel/git/jgarzik/netdev-2.6
Jeff Garzik [Thu, 1 Sep 2005 22:02:27 +0000 (18:02 -0400)]
/spare/repo/netdev-2.6 branch 'ieee80211'
Jeff Garzik [Thu, 1 Sep 2005 22:02:01 +0000 (18:02 -0400)]
/spare/repo/netdev-2.6 branch 'master'
Russell King [Thu, 1 Sep 2005 21:41:55 +0000 (22:41 +0100)]
[ARM] Fix ARMv6 page table bits
We weren't explicitly setting the page table bits we desired
in user_prot in the protection table, which resulted in the
user mappings for v6 CPUs being marked global.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Thu, 1 Sep 2005 17:58:18 +0000 (10:58 -0700)]
Merge refs/heads/release from /linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Thu, 1 Sep 2005 17:56:57 +0000 (10:56 -0700)]
Merge HEAD from /home/rmk/linux-2.6-arm.git
Kumar Gala [Wed, 31 Aug 2005 04:54:47 +0000 (14:54 +1000)]
[PATCH] ppc: L2 cache prefetch fixes on 745x
We run into problems if we blindly enable L2 prefetching without
checking that the L2 cache is actually enabled. Additionaly, if we
disable the L2 cache we need to ensure that we disable L2 prefetching.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Benjamin Herrenschmidt [Wed, 31 Aug 2005 04:16:53 +0000 (14:16 +1000)]
[PATCH] Fix PCI ROM mapping
This fixes a problem with pci_map_rom() which doesn't properly
update the ROM BAR value with the address thas allocated for it by the
PCI code. This problem, among other, breaks boot on Mac laptops.
It'ss a new version based on Linus latest one with better error
checking.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 31 Aug 2005 04:34:05 +0000 (14:34 +1000)]
[PATCH] Fix bug in ppc64 dynamic hugepage support
In adjusting the logic for SLB miss for the dynamic hugepage stuff, I
messed up the !CONFIG_HUGETLB_PAGE case, failing to set the SLB flags
properly.
This fixes it. It also streamlines the logic for the HUGETLB_PAGE case
(removing a couple of branches) while we're at it.
Booted, and roughly tested on POWER5 (with and without HUGETLB_PAGE),
iSeries/RS64 (no hugepage available), and G5 (with and without
HUGETLB_PAGE).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Wed, 31 Aug 2005 15:43:51 +0000 (17:43 +0200)]
[PATCH] Add missing select's to DVB_BUDGET_AV
This fixes the following compile error:
...
LD .tmp_vmlinux1
drivers/built-in.o: In function `frontend_init':
budget-av.c:(.text+0xb9448): undefined reference to `tda10046_attach'
budget-av.c:(.text+0xb9518): undefined reference to `tda10021_attach'
drivers/built-in.o: In function `philips_tu1216_request_firmware':
budget-av.c:(.text+0xb937b): undefined reference to `request_firmware'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Johannes Stezenbach <js@linuxtv.org>
Acked-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Russell King [Thu, 1 Sep 2005 14:56:26 +0000 (15:56 +0100)]
[SERIAL] Move serial8250_*_port prototypes to linux/serial_8250.h
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 1 Sep 2005 13:51:59 +0000 (14:51 +0100)]
[ARM] Simplify setup_mm_for_reboot()
No point checking what CPU architecture level we have each time
within the loop, so precompute the base PMD flags outside the
loop.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 1 Sep 2005 13:45:18 +0000 (14:45 +0100)]
[ARM] Convert open-coded __pmd_populate to use inline function
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 1 Sep 2005 13:25:45 +0000 (14:25 +0100)]
[SERIAL] mwave is no longer broken
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 1 Sep 2005 11:48:48 +0000 (12:48 +0100)]
[ARM] 2864/1: VST aka CONFIG_NO_IDLE_HZ support for SA11x0
Patch from Nicolas Pitre
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 1 Sep 2005 11:48:47 +0000 (12:48 +0100)]
[ARM] 2863/1: clarify comment in PXA2xx and SA1x00 timer code
Patch from Nicolas Pitre
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 1 Sep 2005 11:48:40 +0000 (12:48 +0100)]
[ARM] 2862/1: VST aka CONFIG_NO_IDLE_HZ support for PXA2xx
Patch from Nicolas Pitre
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Thu, 1 Sep 2005 11:37:13 +0000 (12:37 +0100)]
[ARM] 2865/2: fix fadvise64_64 syscall argument passing
Patch from Nicolas Pitre
The prototype for sys_fadvise64_64() is:
long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
The argument list is therefore as follows on legacy ABI:
fd: type int (r0)
offset: type long long (r1-r2)
len: type long long (r3-sp[0])
advice: type int (sp[4])
With EABI this becomes:
fd: type int (r0)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
advice: type int (sp[8])
Not only do we have ABI differences here, but the EABI version requires
one additional word on the syscall stack.
To avoid the ABI mismatch and the extra stack space required with EABI
this syscall is now defined with a different argument ordering
on ARM as follows:
long sys_arm_fadvise64_64(int fd, int advice, loff_t offset, loff_t len)
This gives us the following ABI independent argument distribution:
fd: type int (r0)
advice: type int (r1)
offset: type long long (r2-r3)
len: type long long (sp[0]-sp[4])
Now, since the syscall entry code takes care of 5 registers only by
default including the store of r4 to the stack, we need a wrapper to
store r5 to the stack as well. Because that wrapper was missing and was
always required this means that sys_fadvise64_64 never worked on ARM and
therefore we can safely reuse its syscall number for our new
sys_arm_fadvise64_64 interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Jouni Malinen [Mon, 29 Aug 2005 00:53:32 +0000 (17:53 -0700)]
[PATCH] hostap: Fix null pointer dereference in prism2_pccard_card_present()
local->hw_priv was initialized only after the interrupt handler was
registered. This could trigger a NULL pointer dereference in
prism2_pccard_card_present() that assumed that local->hw_priv is always
set (and it should have been). Fix this by setting local->hw_priv before
registering the interrupt handler.
Signed-off-by: Jouni Malinen <jkmaline@cc.hut.fi>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:29 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Be consistent about driver name, increment version
The iseries_veth driver tells sysfs that it's called 'iseries_veth', but if
you ask it via ethtool it thinks it's called 'veth'. I think this comes from
2.4 when the driver was called 'veth', but it's definitely called
'iseries_veth' now, so fix it.
To make sure we don't do it again define DRV_NAME and use it everywhere.
While we're at it, change the version number to 2.0, to reflect the changes
made in this patch series.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:27 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Remove studly caps from iseries_veth.c
Having merged iseries_veth.h, let's remove some of the studly caps that came
with it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:25 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Incorporate iseries_veth.h in iseries_veth.c
iseries_veth.h is only used by iseries_veth.c, so merge the former into
the latter.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:21 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Add sysfs support for port structs
Also to aid debugging, add sysfs support for iseries_veth's port structures.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:20 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Add sysfs support for connection structs
To aid in field debugging, add sysfs support for iseries_veth's connection
structures. At the moment this is all read-only, however we could think about
adding write support for some attributes in future.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:19 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Fix bogus counting of TX errors
There's a number of problems with the way iseries_veth counts TX errors.
Firstly it counts conditions which aren't really errors as TX errors. This
includes if we don't have a connection struct for the other LPAR, or if the
other LPAR is currently down (or just doesn't want to talk to us). Neither
of these should count as TX errors.
Secondly, it counts one TX error for each LPAR that fails to accept the packet.
This can lead to TX error counts higher than the total number of packets sent
through the interface. This is confusing for users.
This patch fixes that behaviour. The non-error conditions are no longer
counted, and we introduce a new and I think saner meaning to the TX counts.
If a packet is successfully transmitted to any LPAR then it is transmitted
and tx_packets is incremented by 1.
If there is an error transmitting a packet to any LPAR then that is counted
as one error, ie. tx_errors is incremented by 1.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:18 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Simplify full-queue handling
The iseries_veth driver often has multiple netdevices sending packets over
a single connection to another LPAR. If the bandwidth to the other LPAR is
exceeded, all the netdevices must have their queues stopped.
The current code achieves this by queueing one incoming skb on the
per-netdevice port structure. When the connection is able to send more packets
we iterate through the port structs and flush any packet that is queued,
as well as restarting the associated netdevice's queue.
This arrangement makes less sense now that we have per-connection TX timers,
rather than the per-netdevice generic TX timer.
The new code simply detects when one of the connections is full, and stops
the queue of all associated netdevices. Then when a packet is acked on that
connection (ie. there is space again) all the queues are woken up.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:17 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Add a per-connection ack timer
Currently the iseries_veth driver contravenes the specification in
Documentation/networking/driver.txt, in that if packets are not acked by
the other LPAR they will sit around forever.
This patch adds a per-connection timer which fires if we've had no acks for
five seconds. This is superior to the generic TX timer because it catches
the case of a small number of packets being sent and never acked.
This fixes a bug we were seeing on real systems, where some IPv6 neighbour
discovery packets would not be acked and then prevent the module from being
removed, due to skbs lying around.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:12 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Remove TX timeout code
The iseries_veth driver uses the generic TX timeout watchdog, however a better
solution is in the works, so remove this code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:09 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Use kobjects to track lifecycle of connection structs
The iseries_veth driver can attach to multiple vlans, which correspond to
multiple net devices. However there is only 1 connection between each LPAR,
so the connection structure may be shared by multiple net devices.
This makes module removal messy, because we can't deallocate the connections
until we know there are no net devices still using them. The solution is to
use ref counts on the connections, so we can delete them (actually stop) as
soon as the ref count hits zero.
This patch fixes (part of) a bug we were seeing with IPv6 sending probes to
a dead LPAR, which would then hang us forever due to leftover skbs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:08 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Make init_connection() & destroy_connection() symmetrical
This patch makes veth_init_connection() and veth_destroy_connection()
symmetrical in that they allocate/deallocate the same data.
Currently if there's an error while initialising connections (ie. ENOMEM)
we call veth_module_cleanup(), however this will oops because we call
driver_unregister() before we've called driver_register(). I've never seen
this actually happen though.
So instead we explicitly call veth_destroy_connection() for each connection,
any that have been set up will be deallocated.
We also fix a potential leak if vio_register_driver() fails.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:07 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Only call dma_unmap_single() if dma_map_single() succeeded
The iseries_veth driver unconditionally calls dma_unmap_single() even
when the corresponding dma_map_single() may have failed.
Rework the code a bit to keep the return value from dma_unmap_single()
around, and then check if it's a dma_mapping_error() before we do
the dma_unmap_single().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:06 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Replace lock-protected atomic with an ordinary variable
The iseries_veth driver uses atomic ops to manipulate the in_use field of
one of its per-connection structures. However all references to the
flag occur while the connection's lock is held, so the atomic ops aren't
necessary.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:05 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Remove redundant message stack lock
The iseries_veth driver keeps a stack of messages for each connection
and a lock to protect the stack. However there is also a per-connection lock
which makes the message stack lock redundant.
Remove the message stack lock and document the fact that callers of the
stack-manipulation functions must hold the connection's lock.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:02 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Fix broken promiscuous handling
Due to a logic bug, once promiscuous mode is enabled in the iseries_veth
driver it is never disabled.
The driver keeps two flags, promiscuous and all_mcast which have exactly the
same effect. This is because we only ever receive packets destined for us,
or multicast packets. So consolidate them into one promiscuous flag for
simplicity.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Michael Ellerman [Thu, 1 Sep 2005 01:29:00 +0000 (11:29 +1000)]
[PATCH] iseries_veth: Try to avoid pathological reset behaviour
The iseries_veth driver contains a state machine which is used to manage
how connections are setup and neogotiated between LPARs.
If one side of a connection resets for some reason, the two LPARs can get
stuck in a race to re-setup the connection. This can lead to the connection
being declared dead by one or both ends. In practice the connection is
declared dead by one or both ends approximately 8/10 times a connection is
reset, although it is rare for connections to be reset.
(an example here: http://michael.ellerman.id.au/files/misc/veth-trace.html)
The core of the problem is that the end that resets the connection doesn't
wait for the other end to become aware of the reset. So the resetting end
starts setting the connection back up, and then receives a reset from the
other end (which is the response to the initial reset). And so on.
We're severely limited in what we can do to fix this. The protocol between
LPARs is essentially fixed, as we have to interoperate with both OS/400
and old Linux drivers. Which also means we need a fix that only changes the
code on one end.
The only fix I've found given that, is to just blindly sleep for a bit when
resetting the connection, in the hope that the other end will get itself
sorted. Needless to say I'd love it if someone has a better idea.
This does work, I've so far been unable to get it to break, whereas without
the fix a reset of one end will lead to a dead connection ~8/10 times.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>