openwrt/staging/blogic.git
18 years ago[IPV6]: Added GSO support for TCPv6
Herbert Xu [Fri, 30 Jun 2006 20:37:03 +0000 (13:37 -0700)]
[IPV6]: Added GSO support for TCPv6

This patch adds GSO support for IPv6 and TCPv6.  This is based on a patch
by Ananda Raju <Ananda.Raju@neterion.com>.  His original description is:

This patch enables TSO over IPv6. Currently Linux network stacks
restricts TSO over IPv6 by clearing of the NETIF_F_TSO bit from
"dev->features". This patch will remove this restriction.

This patch will introduce a new flag NETIF_F_TSO6 which will be used
to check whether device supports TSO over IPv6. If device support TSO
over IPv6 then we don't clear of NETIF_F_TSO and which will make the
TCP layer to create TSO packets. Any device supporting TSO over IPv6
will set NETIF_F_TSO6 flag in "dev->features" along with NETIF_F_TSO.

In case when user disables TSO using ethtool, NETIF_F_TSO will get
cleared from "dev->features". So even if we have NETIF_F_TSO6 we don't
get TSO packets created by TCP layer.

SKB_GSO_TCPV4 renamed to SKB_GSO_TCP to make it generic GSO packet.
SKB_GSO_UDPV4 renamed to SKB_GSO_UDP as UFO is not a IPv4 feature.
UFO is supported over IPv6 also

The following table shows there is significant improvement in
throughput with normal frames and CPU usage for both normal and jumbo.

--------------------------------------------------
|          |     1500        |      9600         |
|          ------------------|-------------------|
|          | thru     CPU    |  thru     CPU     |
--------------------------------------------------
| TSO OFF  | 2.00   5.5% id  |  5.66   20.0% id  |
--------------------------------------------------
| TSO ON   | 2.63   78.0 id  |  5.67   39.0% id  |
--------------------------------------------------

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Generalise TSO-specific bits from skb_setup_caps
Herbert Xu [Fri, 30 Jun 2006 20:36:35 +0000 (13:36 -0700)]
[NET]: Generalise TSO-specific bits from skb_setup_caps

This patch generalises the TSO-specific bits from sk_setup_caps by adding
the sk_gso_type member to struct sock.  This makes sk_setup_caps generic
so that it can be used by TCPv6 or UFO.

The only catch is that whoever uses this must provide a GSO implementation
for their protocol which I think is a fair deal :) For now UFO continues to
live without a GSO implementation which is OK since it doesn't use the sock
caps field at the moment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Added GSO support for TCPv6
Herbert Xu [Fri, 30 Jun 2006 20:36:15 +0000 (13:36 -0700)]
[IPV6]: Added GSO support for TCPv6

This patch adds GSO support for IPv6 and TCPv6.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IPV6]: Remove redundant length check on input
Herbert Xu [Fri, 30 Jun 2006 20:35:46 +0000 (13:35 -0700)]
[IPV6]: Remove redundant length check on input

We don't need to check skb->len when we're just about to call
pskb_may_pull since that checks it for us.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks
Patrick McHardy [Fri, 30 Jun 2006 04:40:23 +0000 (21:40 -0700)]
[NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks

When a packet without any chunks is received, the newconntrack variable
in sctp_packet contains an out of bounds value that is used to look up an
pointer from the array of timeouts, which is then dereferenced, resulting
in a crash. Make sure at least a single chunk is present.

Problem noticed by George A. Theall <theall@tenablesecurity.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: Update version and reldate
Michael Chan [Fri, 30 Jun 2006 03:16:28 +0000 (20:16 -0700)]
[TG3]: Update version and reldate

Update version to 3.61.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: Add TSO workaround using GSO
Michael Chan [Fri, 30 Jun 2006 03:15:54 +0000 (20:15 -0700)]
[TG3]: Add TSO workaround using GSO

Use GSO to workaround a rare TSO bug on some chips.  This hardware
bug may be triggered when the TSO header size is greater than 80
bytes.  When this condition is detected in a TSO packet, the driver
will use GSO to segment the packet to workaround the hardware bug.

Thanks to Juergen Kreileder <jk@blackdown.de> for reporting the
problem and collecting traces to help debug the problem.

And thanks to Herbert Xu <herbert@gondor.apana.org.au> for providing
the GSO mechanism that happens to be the perfect workaround for this
problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: Turn on hw fix for ASF problems
Michael Chan [Fri, 30 Jun 2006 03:15:13 +0000 (20:15 -0700)]
[TG3]: Turn on hw fix for ASF problems

Clear a bit to enable a hardware fix for some ASF related problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: Add rx BD workaround
Michael Chan [Fri, 30 Jun 2006 03:14:29 +0000 (20:14 -0700)]
[TG3]: Add rx BD workaround

Add workaround to limit the burst size of rx BDs being DMA'ed to the
chip.  This works around hardware errata on a number of 5750, 5752,
and 5755 chips.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TG3]: Add tg3_netif_stop() in vlan functions
Michael Chan [Fri, 30 Jun 2006 03:12:30 +0000 (20:12 -0700)]
[TG3]: Add tg3_netif_stop() in vlan functions

Add tg3_netif_stop() when changing the vlgrp (vlan group) pointer. It
is necessary to quiesce the device before changing that pointer.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP]: Reset gso_segs if packet is dodgy
Herbert Xu [Fri, 30 Jun 2006 03:11:25 +0000 (20:11 -0700)]
[TCP]: Reset gso_segs if packet is dodgy

I wasn't paranoid enough in verifying GSO information.  A bogus gso_segs
could upset drivers as much as a bogus header would.  Let's reset it in
the per-protocol gso_segment functions.

I didn't verify gso_size because that can be verified by the source of
the dodgy packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PATCH] i2c-801: 64bit resource fix
Andrew Morton [Fri, 30 Jun 2006 08:56:20 +0000 (01:56 -0700)]
[PATCH] i2c-801: 64bit resource fix

drivers/i2c/busses/i2c-i801.c: In function 'i801_probe':
drivers/i2c/busses/i2c-i801.c:496: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'resource_size_t'

Cc: Greg KH <greg@kroah.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] infiniband: devfs fix
Andrew Morton [Fri, 30 Jun 2006 08:56:20 +0000 (01:56 -0700)]
[PATCH] infiniband: devfs fix

Remove devfs leftovers.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: svcrpc: gss: server-side implementation of rpcsec_gss privacy
J. Bruce Fields [Fri, 30 Jun 2006 08:56:19 +0000 (01:56 -0700)]
[PATCH] knfsd: svcrpc: gss: server-side implementation of rpcsec_gss privacy

Server-side implementation of rpcsec_gss privacy, which enables encryption of
the payload of every rpc request and response.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy case
J. Bruce Fields [Fri, 30 Jun 2006 08:56:19 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy case

Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages.  This makes life simpler when we encrypt the response.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: svcrpc: Simplify nfsd rpcsec_gss integrity code
J. Bruce Fields [Fri, 30 Jun 2006 08:56:18 +0000 (01:56 -0700)]
[PATCH] knfsd: svcrpc: Simplify nfsd rpcsec_gss integrity code

Pull out some of the integrity code into its own function, otherwise
svcauth_gss_release() is going to become very ungainly after the addition of
privacy code.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd4: fix open flag passing
J. Bruce Fields [Fri, 30 Jun 2006 08:56:17 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd4: fix open flag passing

Since nfsv4 actually keeps around the file descriptors it gets from open
(instead of just using them for a single read or write operation), we need to
make sure that we can do RDWR opens and not just RDONLY/WRONLY.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd4: fix some open argument tests
J. Bruce Fields [Fri, 30 Jun 2006 08:56:16 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd4: fix some open argument tests

These tests always returned true; clearly that wasn't what was intended.

In keeping with kernel style, make them functions instead of macros while
we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: svcrpc: gss: simplify rsc_parse()
J. Bruce Fields [Fri, 30 Jun 2006 08:56:16 +0000 (01:56 -0700)]
[PATCH] knfsd: svcrpc: gss: simplify rsc_parse()

Adopt a simpler convention for gss_mech_put(), to simplify rsc_parse().

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd: fix misplaced fh_unlock() in nfsd_link()
David M. Richter [Fri, 30 Jun 2006 08:56:15 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd: fix misplaced fh_unlock() in nfsd_link()

In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is
skipped and locks are held overlong.

Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and
verifying that fh_unlock() gets called appropriately.

Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd4: remove superfluous grace period checks
J. Bruce Fields [Fri, 30 Jun 2006 08:56:14 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd4: remove superfluous grace period checks

We're checking nfs_in_grace here a few times when there isn't really any
reason to--bad_stateid is probably the more sensible return value anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd: call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem
J. Bruce Fields [Fri, 30 Jun 2006 08:56:14 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd: call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem

In the typical v2/v3 case the only new filehandles used as arguments to
operations are filehandles taken directly off the wire, which don't get
dentries until fh_verify() is called.

But in v4 the filehandles that are arguments to operations were often created
by previous operations (putrootfh, lookup, etc.) using fh_compose, which sets
the dentry in the filehandle without calling nfsd_setuser().

This also means that, for example, if filesystem B is mounted on filesystem A,
and filesystem A is exported without root-squashing, then a client can bypass
the rootsquashing on B using a compound that starts at a filehandle in A,
crosses into B using lookups, and then does stuff in B.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: nfsd4: fix open_confirm locking
J. Bruce Fields [Fri, 30 Jun 2006 08:56:13 +0000 (01:56 -0700)]
[PATCH] knfsd: nfsd4: fix open_confirm locking

Fix an improper unlock in an error path.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: ignore ref_fh when crossing a mountpoint
NeilBrown [Fri, 30 Jun 2006 08:56:12 +0000 (01:56 -0700)]
[PATCH] knfsd: ignore ref_fh when crossing a mountpoint

nfsd tries to return to a client the same sort of filehandle as was used by
the client.  This removes some filehandle aliasing issues and means that a
server upgrade followed by a downgrade will not confused clients not restarted
during that time.

However when crossing a mountpoint, the filehandle used for one filesystem
doesn't provide any useful information on what sort of filehandle should be
used on the other, and can provide misleading information.  So if the
reference filehandle is on a different filesystem to the one being generated,
ignore it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: remove noise about filehandle being uptodate
NeilBrown [Fri, 30 Jun 2006 08:56:11 +0000 (01:56 -0700)]
[PATCH] knfsd: remove noise about filehandle being uptodate

There is a perfectly valid situation where fh_update gets called on an already
uptodate filehandle - in nfsd_create_v3 where a CREATE_UNCHECKED finds an
existing file and wants to just set the size.

We could possible optimise out the call in that case, but the only harm
involved is that fh_update prints a warning, so it is easier to remove the
warning.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: fixing missing 'expkey' support for fsid type 3
Frank Filz [Fri, 30 Jun 2006 08:56:11 +0000 (01:56 -0700)]
[PATCH] knfsd: fixing missing 'expkey' support for fsid type 3

Type '3' is used for the fsid in filehandles when the device number of the
device holding the filesystem has more than 8 bits in either major or minor.
Unfortunately expkey_parse doesn't recognise type 3.  Fix this.

(Slighty modified from Frank's original)

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] knfsd: improve the test for cross-device-rename in nfsd
NeilBrown [Fri, 30 Jun 2006 08:56:10 +0000 (01:56 -0700)]
[PATCH] knfsd: improve the test for cross-device-rename in nfsd

Just testing the i_sb isn't really enough, at least the vfsmnt must be the
same.  Thanks Al.

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] EDAC: maintainers update
Doug Thompson [Fri, 30 Jun 2006 08:56:09 +0000 (01:56 -0700)]
[PATCH] EDAC: maintainers update

Removed Dave Peterson as per his request as co-maintainer of EDAC Thanks Dave.

Added Mark Gross as maintainer of edac-e752x driver Thanks Mark

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Dave Peterson <dsp@llnl.gov>
Signed-off-by: Mark Gross <mark.gross@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] EDAC: probe1 cleanup 1-of-2
Doug Thompson [Fri, 30 Jun 2006 08:56:08 +0000 (01:56 -0700)]
[PATCH] EDAC: probe1 cleanup 1-of-2

- Add lower-level functions that handle various parts of the initialization
  done by the xxx_probe1() functions.  Some of the xxx_probe1() functions are
  much too long and complicated (see "Chapter 5: Functions" in
  Documentation/CodingStyle).

- Cleanup of probe1() functions in EDAC

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] EDAC: mc numbers refactor 1-of-2
Doug Thompson [Fri, 30 Jun 2006 08:56:08 +0000 (01:56 -0700)]
[PATCH] EDAC: mc numbers refactor 1-of-2

Remove add_mc_to_global_list().  In next patch, this function will be
reimplemented with different semantics.

1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
  determine the ID number for a mem_ctl_info structure.  Then modify
  edac_mc_add_mc() so that the caller specifies the ID number for the new
  mem_ctl_info structure.  Platform-specific code should be able to assign the
  ID numbers in a platform-specific manner.  For instance, on Opteron it makes
  sense to have the ID of the mem_ctl_info structure match the ID of the node
  that the memory controller belongs to.

2 Modify callers of edac_mc_add_mc() so they use the new semantics.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] EDAC: PCI device to DEVICE cleanup
Doug Thompson [Fri, 30 Jun 2006 08:56:07 +0000 (01:56 -0700)]
[PATCH] EDAC: PCI device to DEVICE cleanup

Change MC drivers from using CVS revision strings for their version number,
Now each driver has its own local string.

Remove some PCI dependencies from the core EDAC module.  Made the code 'struct
device' centric instead of 'struct pci_dev' Most of the code changes here are
from a patch by Dave Jiang.  It may be best to eventually move the
PCI-specific code into a separate source file.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Documentation/IPMI typos
Matt LaPlante [Fri, 30 Jun 2006 08:56:06 +0000 (01:56 -0700)]
[PATCH] Documentation/IPMI typos

Two typos in Documentation/IPMI.

Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] rcu: Add lock annotations to RCU locking primitives
Josh Triplett [Fri, 30 Jun 2006 08:56:05 +0000 (01:56 -0700)]
[PATCH] rcu: Add lock annotations to RCU locking primitives

Add __acquire annotations to rcu_read_lock and rcu_read_lock_bh, and add
__release annotations to rcu_read_unlock and rcu_read_unlock_bh.  This
allows sparse to detect improperly paired calls to these functions.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] drivers/cdrom/cm206.c: cleanups
Adrian Bunk [Fri, 30 Jun 2006 08:56:04 +0000 (01:56 -0700)]
[PATCH] drivers/cdrom/cm206.c: cleanups

- make __cm206_init() __init (required since it calls
  the __init cm206_init())
- make the needlessly global bcdbin() static
- remove a comment with an obsolete compile command

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] show Acorn-specific block devices menu only when required
Adrian Bunk [Fri, 30 Jun 2006 08:56:02 +0000 (01:56 -0700)]
[PATCH] show Acorn-specific block devices menu only when required

Don't show a menu that can't be entered due to lack of contents on arm (the
options are only available on arm26).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ian Molton <spyro@f2s.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Correct rtc_wkalrm comments
Andrew Victor [Fri, 30 Jun 2006 08:56:01 +0000 (01:56 -0700)]
[PATCH] Correct rtc_wkalrm comments

This corrects the comments describing the 'enabled' and 'pending' flags in
struct rtc_wkalrm of include/linux/rtc.h.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] cond_resched() fix
Andrew Morton [Fri, 30 Jun 2006 08:56:00 +0000 (01:56 -0700)]
[PATCH] cond_resched() fix

Fix a bug identified by Zou Nan hai <nanhai.zou@intel.com>:

If the system is in state SYSTEM_BOOTING, and need_resched() is true,
cond_resched() returns true even though it didn't reschedule.  Consequently
need_resched() remains true and JBD locks up.

Fix that by teaching cond_resched() to only return true if it really did call
schedule().

cond_resched_lock() and cond_resched_softirq() have a problem too.  If we're
in SYSTEM_BOOTING state and need_resched() is true, these functions will drop
the lock and will then try to call schedule(), but the SYSTEM_BOOTING state
will prevent schedule() from being called.  So on return, need_resched() will
still be true, but cond_resched_lock() has to return 1 to tell the caller that
the lock was dropped.  The caller will probably lock up.

Bottom line: if these functions dropped the lock, they _must_ call schedule()
to clear need_resched().   Make it so.

Also, uninline __cond_resched().  It's largeish, and slowpath.

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: add __raw_writeq definition
Jeff Dike [Fri, 30 Jun 2006 08:55:59 +0000 (01:55 -0700)]
[PATCH] uml: add __raw_writeq definition

The x86_64 build requires a definition for __raw_writeq.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: fix biarch gcc build on x86_64
Jeff Dike [Fri, 30 Jun 2006 08:55:58 +0000 (01:55 -0700)]
[PATCH] uml: fix biarch gcc build on x86_64

I run an x86_64 kernel with i386 userspace (Ubuntu Dapper) and decided to try
out UML today.  I found that UML wasn't quite aware of biarch compilers (which
Ubuntu i386 ships).  A fix similar to what was done for x86_64 should probably
be committed (see
http://marc.theaimsgroup.com/?l=linux-kernel&m=113425940204010&w=2).  Without
the FLAGS changes, the build will fail at a number of places and without the
LINK change, the final link will fail.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: remove stray file
Jeff Dike [Fri, 30 Jun 2006 08:55:57 +0000 (01:55 -0700)]
[PATCH] uml: remove stray file

Forgot to remove arch/um/kernel/time.c when it was mostly moved to
arch/um/os-Linux.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: remove unneeded time definitions
Jeff Dike [Fri, 30 Jun 2006 08:55:57 +0000 (01:55 -0700)]
[PATCH] uml: remove unneeded time definitions

Remove um_time() and um_stime() syscalls since they are identical to
system-wide ones.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: add locking to xtime accesses
Jeff Dike [Fri, 30 Jun 2006 08:55:56 +0000 (01:55 -0700)]
[PATCH] uml: add locking to xtime accesses

do_timer must be called with xtime_lock held.  I'm not sure boot_timer_handler
needs this, however I don't think it hurts: it simply disables irq and takes a
spinlock.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: unregister useless console when it's not needed
Jeff Dike [Fri, 30 Jun 2006 08:55:55 +0000 (01:55 -0700)]
[PATCH] uml: unregister useless console when it's not needed

-mm in combination with an FC5 init started dying with 'stderr=1' because init
didn't like the lack of /dev/console and exited.  The problem was that the
stderr console, which is intended to dump printk output to the terminal before
the regular console is initialized, isn't a tty, and so can't make
/dev/console operational.

However, since it is registered first, the normal console, when it is
registered, doesn't become the preferred console, and isn't attached to
/dev/console.  Thus, /dev/console is never operational.

This patch makes the stderr console unregister itself in an initcall, which is
late enough that the normal console is registered.  When that happens, the
normal console will become the preferred console and will be able to run
/dev/console.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: fix off-by-one bug in VM file creation
Jeff Dike [Fri, 30 Jun 2006 08:55:55 +0000 (01:55 -0700)]
[PATCH] uml: fix off-by-one bug in VM file creation

Fix an off-by-one bug in temp file creation.  Seeking to the desired length
and writing a byte resulted in the file being one byte longer than expected.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] uml: fix /proc/mounts parsing boundary condition
Jeff Dike [Fri, 30 Jun 2006 08:55:54 +0000 (01:55 -0700)]
[PATCH] uml: fix /proc/mounts parsing boundary condition

When parsing /proc/mounts looking for a tmpfs mount on /dev/shm, if a string
that we are looking for if split across reads, then it won't be recognized.

Fix this by refilling the buffer whenever we advance the cursor.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies
Adrian Bunk [Fri, 30 Jun 2006 08:55:51 +0000 (01:55 -0700)]
[PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies

Fix the INIT_ENV_ARG_LIMIT dependencies to what seems to have been
intended.

Spotted by Jean-Luc Leger.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] add smp_setup_processor_id()
Andrew Morton [Fri, 30 Jun 2006 08:55:50 +0000 (01:55 -0700)]
[PATCH] add smp_setup_processor_id()

Presently, smp_processor_id() isn't necessarily set up until setup_arch().
But it's used in boot_cpu_init() and printk() and perhaps in other places,
prior to setup_arch() being called.

So provide a new smp_setup_processor_id() which is called before anything
else, wire it up for Voyager (which boots on a CPU other than #0, and broke).

Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: Add security hook definition for getioprio and insert hooks
David Quigley [Fri, 30 Jun 2006 08:55:49 +0000 (01:55 -0700)]
[PATCH] SELinux: Add security hook definition for getioprio and insert hooks

Add a new security hook definition for the sys_ioprio_get operation.  At
present, the SELinux hook function implementation for this hook is
identical to the getscheduler implementation but a separate hook is
introduced to allow this check to be specialized in the future if
necessary.

This patch also creates a helper function get_task_ioprio which handles the
access check in addition to retrieving the ioprio value for the task.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: update USB code with new kill_proc_info_as_uid
David Quigley [Fri, 30 Jun 2006 08:55:48 +0000 (01:55 -0700)]
[PATCH] SELinux: update USB code with new kill_proc_info_as_uid

This patch updates the USB core to save and pass the sending task secid when
sending signals upon AIO completion so that proper security checking can be
applied by security modules.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: add security hook call to kill_proc_info_as_uid
David Quigley [Fri, 30 Jun 2006 08:55:47 +0000 (01:55 -0700)]
[PATCH] SELinux: add security hook call to kill_proc_info_as_uid

This patch adds a call to the extended security_task_kill hook introduced by
the prior patch to the kill_proc_info_as_uid function so that these signals
can be properly mediated by security modules.  It also updates the existing
hook call in check_kill_permission.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] SELinux: extend task_kill hook to handle signals sent by AIO completion
David Quigley [Fri, 30 Jun 2006 08:55:46 +0000 (01:55 -0700)]
[PATCH] SELinux: extend task_kill hook to handle signals sent by AIO completion

This patch extends the security_task_kill hook to handle signals sent by AIO
completion.  In this case, the secid of the task responsible for the signal
needs to be obtained and saved earlier, so a security_task_getsecid() hook is
added, and then this saved value is passed subsequently to the extended
task_kill hook for use in checking.

Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] slab: consolidate code to free slabs from freelist
Christoph Lameter [Fri, 30 Jun 2006 08:55:45 +0000 (01:55 -0700)]
[PATCH] slab: consolidate code to free slabs from freelist

Post and discussion:
http://marc.theaimsgroup.com/?t=115074342800003&r=1&w=2

Code in __shrink_node() duplicates code in cache_reap()

Add a new function drain_freelist that removes slabs with objects that are
already free and use that in various places.

This eliminates the __node_shrink() function and provides the interrupt
holdoff reduction from slab_free to code that used to call __node_shrink.

[akpm@osdl.org: build fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Light weight event counters
Christoph Lameter [Fri, 30 Jun 2006 08:55:45 +0000 (01:55 -0700)]
[PATCH] Light weight event counters

The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Use Zoned VM Counters for NUMA statistics
Christoph Lameter [Fri, 30 Jun 2006 08:55:44 +0000 (01:55 -0700)]
[PATCH] Use Zoned VM Counters for NUMA statistics

The numa statistics are really event counters.  But they are per node and
so we have had special treatment for these counters through additional
fields on the pcp structure.  We can now use the per zone nature of the
zoned VM counters to realize these.

This will shrink the size of the pcp structure on NUMA systems.  We will
have some room to add additional per zone counters that will all still fit
in the same cacheline.

 Bits Prior pcp size    Size after patch We can add
 ------------------------------------------------------------------
 64 128 bytes (16 words) 80 bytes (10 words) 48
 32  76 bytes (19 words) 56 bytes (14 words) 8 (64 byte cacheline)
72 (128 byte)

Remove the special statistics for numa and replace them with zoned vm
counters.  This has the side effect that global sums of these events now
show up in /proc/vmstat.

Also take the opportunity to move the zone_statistics() function from
page_alloc.c into vmstat.c.

Discussions:
V2 http://marc.theaimsgroup.com/?t=115048227000002&r=1&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned-vm-counters: remove read_page_state()
Andrew Morton [Fri, 30 Jun 2006 08:55:43 +0000 (01:55 -0700)]
[PATCH] zoned-vm-counters: remove read_page_state()

No callers.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: remove useless struct wbs
Christoph Lameter [Fri, 30 Jun 2006 08:55:42 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: remove useless struct wbs

Remove writeback state

We can remove some functions now that were needed to calculate the page state
for writeback control since these statistics are now directly available.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:41 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter

Conversion of nr_bounce to a per zone counter

nr_bounce is only used for proc output.  So it could be left as an event
counter.  However, the event counters may not be accurate and nr_bounce is
categorizing types of pages in a zone.  So we really need this to also be a
per zone counter.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_unstable to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:40 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_unstable to per zone counter

Conversion of nr_unstable to a per zone counter

We need to do some special modifications to the nfs code since there are
multiple cases of disposition and we need to have a page ref for proper
accounting.

This converts the last critical page state of the VM and therefore we need to
remove several functions that were depending on GET_PAGE_STATE_LAST in order
to make the kernel compile again.  We are only left with event type counters
in page state.

[akpm@osdl.org: bugfixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_writeback to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:40 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_writeback to per zone counter

Conversion of nr_writeback to per zone counter.

This removes the last page_state counter from arch/i386/mm/pgtable.c so we
drop the page_state from there.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_dirty to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:39 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_dirty to per zone counter

This makes nr_dirty a per zone counter.  Looping over all processors is
avoided during writeback state determination.

The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones.  Someone more familiar with
NFS should probably review what I have done.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:38 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter

Conversion of nr_page_table_pages to a per zone counter

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_slab to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:38 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_slab to per zone counter

- Allows reclaim to access counter without looping over processor counts.

- Allows accurate statistics on how many pages are used in a zone by
  the slab. This may become useful to balance slab allocations over
  various zones.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval
Christoph Lameter [Fri, 30 Jun 2006 08:55:37 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval

The zone_reclaim_interval was necessary because we were not able to determine
how many unmapped pages exist in a zone.  Therefore we had to scan in
intervals to figure out if any pages were unmapped.

With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
pages and the number of mapped pages in a zone.  So we can simply skip the
reclaim if there is an insufficient number of unmapped pages.  We use
SWAP_CLUSTER_MAX as the boundary.

Drop all support for /proc/sys/vm/zone_reclaim_interval.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED
Christoph Lameter [Fri, 30 Jun 2006 08:55:36 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED

The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
calculation as the number of mapped pagecache pages.  However, that is not
true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
separates those and therefore allows an accurate tracking of the anonymous
pages per zone.

It then becomes possible to determine the number of unmapped pages per zone
and we can avoid scanning for unmapped pages if there are none.

Also it may now be possible to determine the mapped/unmapped ratio in
get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
calculation?

Note that this will change the meaning of the number of mapped pages reported
in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
user space tools that monitor these counters!  NR_FILE_MAPPED works like
NR_FILE_DIRTY.  It is only valid for pagecache pages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure
Christoph Lameter [Fri, 30 Jun 2006 08:55:36 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure

We can now access the number of pages in a mapped state in an inexpensive way
in shrink_active_list.  So drop the nr_mapped field from scan_control.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:35 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter

Currently a single atomic variable is used to establish the size of the page
cache in the whole machine.  The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: convert nr_mapped to per zone counter
Christoph Lameter [Fri, 30 Jun 2006 08:55:34 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: convert nr_mapped to per zone counter

nr_mapped is important because it allows a determination of how many pages of
a zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

We take the nr_mapped field out of the page state structure and define a new
per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
from NR_MAPPED in the next patch).

We replace the use of nr_mapped in various kernel locations.  This avoids the
looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
zone reclaim).

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation
Christoph Lameter [Fri, 30 Jun 2006 08:55:33 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation

Per zone counter infrastructure

The counters that we currently have for the VM are split per processor.  The
processor however has not much to do with the zone these pages belong to.  We
cannot tell f.e.  how many ZONE_DMA pages are dirty.

So we are blind to potentially inbalances in the usage of memory in various
zones.  F.e.  in a NUMA system we cannot tell how many pages are dirty on a
particular node.  If we knew then we could put measures into the VM to balance
the use of memory between different zones and different nodes in a NUMA
system.  For example it would be possible to limit the dirty pages per node so
that fast local memory is kept available even if a process is dirtying huge
amounts of pages.

Another example is zone reclaim.  We do not know how many unmapped pages exist
per zone.  So we just have to try to reclaim.  If it is not working then we
pause and try again later.  It would be better if we knew when it makes sense
to reclaim unmapped pages from a zone.  This patchset allows the determination
of the number of unmapped pages per zone.  We can remove the zone reclaim
interval with the counters introduced here.

Futhermore the ability to have various usage statistics available will allow
the development of new NUMA balancing algorithms that may be able to improve
the decision making in the scheduler of when to move a process to another node
and hopefully will also enable automatic page migration through a user space
program that can analyse the memory load distribution and then rebalance
memory use in order to increase performance.

The counter framework here implements differential counters for each processor
in struct zone.  The differential counters are consolidated when a threshold
is exceeded (like done in the current implementation for nr_pageache), when
slab reaping occurs or when a consolidation function is called.

Consolidation uses atomic operations and accumulates counters per zone in the
zone structure and also globally in the vm_stat array.  VM functions can
access the counts by simply indexing a global or zone specific array.

The arrangement of counters in an array also simplifies processing when output
has to be generated for /proc/*.

Counters can be updated by calling inc/dec_zone_page_state or
_inc/dec_zone_page_state analogous to *_page_state.  The second group of
functions can be called if it is known that interrupts are disabled.

Special optimized increment and decrement functions are provided.  These can
avoid certain checks and use increment or decrement instructions that an
architecture may provide.

We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
do DMA to all memory and therefore ZONE_NORMAL will not be populated.  This is
only currently set for IA64 SGI SN2 and currently only affects
node_page_state().  In the best case node_page_state can be reduced to
retrieving a single counter for the one zone on the node.

[akpm@osdl.org: cleanups]
[akpm@osdl.org: export vm_stat[] for filesystems]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h
Christoph Lameter [Fri, 30 Jun 2006 08:55:32 +0000 (01:55 -0700)]
[PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h

NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
event counters do not need to be.

Zone based VM statistics are necessary to be able to determine what the state
of memory in one zone is.  In a NUMA system this can be helpful for local
reclaim and other memory optimizations that may be able to shift VM load in
order to get more balanced memory use.

It is also useful to know how the computing load affects the memory
allocations on various zones.  This patchset allows the retrieval of that data
from userspace.

The patchset introduces a framework for counters that is a cross between the
existing page_stats --which are simply global counters split per cpu-- and the
approach of deferred incremental updates implemented for nr_pagecache.

Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
certain thresholds then the counters are accumulated in an array of
atomic_long in the zone and in a global array that sums up all zone values.
The small 8 bit counters are next to the per cpu page pointers and so they
will be in high in the cpu cache when pages are allocated and freed.

Access to VM counter information for a zone and for the whole machine is then
possible by simply indexing an array (Thanks to Nick Piggin for pointing out
that approach).  The access to the total number of pages of various types does
no longer require the summing up of all per cpu counters.

Benefits of this patchset right now:

- Ability for UP and SMP configuration to determine how memory
  is balanced between the DMA, NORMAL and HIGHMEM zones.

- loops over all processors are avoided in writeback and
  reclaim paths. We can avoid caching the writeback information
  because the needed information is directly accessible.

- Special handling for nr_pagecache removed.

- zone_reclaim_interval vanishes since VM stats can now determine
  when it is worth to do local reclaim.

- Fast inline per node page state determination.

- Accurate counters in /sys/devices/system/node/node*/meminfo. Current
  counters are counting simply which processor allocated a page somewhere
  and guestimate based on that. So the counters were not useful to show
  the actual distribution of page use on a specific zone.

- The swap_prefetch patch requires per node statistics in order to
  figure out when processors of a node can prefetch. This patch provides
  some of the needed numbers.

- Detailed VM counters available in more /proc and /sys status files.

References to earlier discussions:
V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

Performance tests with AIM7 did not show any regressions.  Seems to be a tad
faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
fixes for s390/arm/uml arch code.

This patch:

Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

Create vmstat.c/vmstat.h by separating the counter code and the proc
functions.

Move the vm_stat_text array before zoneinfo_show.

[akpm@osdl.org: s390 build fix]
[akpm@osdl.org: HOTPLUG_CPU build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix ISTALLION=y
Adrian Bunk [Fri, 30 Jun 2006 08:55:30 +0000 (01:55 -0700)]
[PATCH] fix ISTALLION=y

drivers/char/istallion.c: In function â\80\98stli_initbrdsâ\80\99:
drivers/char/istallion.c:4150: error: implicit declaration of function â\80\98stli_parsebrdâ\80\99
drivers/char/istallion.c:4150: error: â\80\98stli_brdspâ\80\99 undeclared (first use in this function)
drivers/char/istallion.c:4150: error: (Each undeclared identifier is reported only once
drivers/char/istallion.c:4150: error: for each function it appears in.)
drivers/char/istallion.c:4164: error: implicit declaration of function â\80\98stli_argbrdsâ\80\99

While I was at it, I also removed the #ifdef MODULE around the initialation
code to allow it to perhaps work when built into the kernel and made a
needlessly global function static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] msr.c: use register_hotcpu_notifier()
Andrew Morton [Fri, 30 Jun 2006 08:55:29 +0000 (01:55 -0700)]
[PATCH] msr.c: use register_hotcpu_notifier()

register_cpu_notifier() cannot do anything in a module, in a
!CONFIG_HOTPLUG_CPU kernel.

Cc: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix platform_device_put/del mishaps
Ingo Molnar [Fri, 30 Jun 2006 08:55:29 +0000 (01:55 -0700)]
[PATCH] fix platform_device_put/del mishaps

This fixes drivers/char/pc8736x_gpio.c and drivers/char/scx200_gpio.c to
use the platform_device_del/put ops correctly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix drivers/video/imacfb.c compilation
Ingo Molnar [Fri, 30 Jun 2006 08:55:27 +0000 (01:55 -0700)]
[PATCH] fix drivers/video/imacfb.c compilation

Fix build error on x86_64.  There's nothing even remotely close to
imacmp_seg in the kernel, so I removed the whole line.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 30 Jun 2006 00:44:21 +0000 (17:44 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
  ocfs2: clean up some osb fields
  ocfs2: fix init of uuid_net_key
  ocfs2: silence a debug print
  ocfs2: silence ENOENT during lookup of broken links
  ocfs2: Cleanup message prints
  ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
  [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
  ocfs2: warn the user on a dead timeout mismatch
  ocfs2: OCFS2_FS must depend on SYSFS
  ocfs2: Compile-time disabling of ocfs2 debugging output.
  configfs: Clear up a few extra spaces where there should be TABs.
  configfs: Release memory in configfs_example.

18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 30 Jun 2006 00:43:43 +0000 (17:43 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  [TIPC]: Initial activation message now includes TIPC version number
  [TIPC]: Improve response to requests for node/link information
  [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf
  [IrDA]: Fix the AU1000 FIR dependencies
  [IrDA]: Fix RCU lock pairing on error path
  [XFRM]: unexport xfrm_state_mtu
  [NET]: make skb_release_data() static
  [NETFILTE] ipv4: Fix typo (Bugzilla #6753)
  [IrDA]: MCS7780 usb_driver struct should be static
  [BNX2]: Turn off link during shutdown
  [BNX2]: Use dev_kfree_skb() instead of the _irq version
  [ATM]: basic sysfs support for ATM devices
  [ATM]: [suni] change suni_init to __devinit
  [ATM]: [iphase] should be __devinit not __init
  [ATM]: [idt77105] should be __devinit not __init
  [BNX2]: Add NETIF_F_TSO_ECN
  [NET]: Add ECN support for TSO
  [AF_UNIX]: Datagram getpeersec
  [NET]: Fix logical error in skb_gso_ok
  [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
  ...

18 years ago[TIPC]: Initial activation message now includes TIPC version number
Allan Stephens [Thu, 29 Jun 2006 19:33:51 +0000 (12:33 -0700)]
[TIPC]: Initial activation message now includes TIPC version number

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TIPC]: Improve response to requests for node/link information
Allan Stephens [Thu, 29 Jun 2006 19:33:20 +0000 (12:33 -0700)]
[TIPC]: Improve response to requests for node/link information

Now allocates reply space for "get links" request based on number of actual
links, not number of potential links.  Also, limits reply to "get links" and
"get nodes" requests to 32KB to match capabilities of tipc-config utility
that issued request.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
18 years ago[TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf
Allan Stephens [Thu, 29 Jun 2006 19:32:46 +0000 (12:32 -0700)]
[TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf

Now determines tailroom of bundle buffer by directly inspection of buffer.
Previously, buffer was assumed to have a max capacity equal to the link MTU,
but the addition of link MTU negotiation means that the link MTU can increase
after the bundle buffer is allocated.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: Fix the AU1000 FIR dependencies
Adrian Bunk [Fri, 30 Jun 2006 00:03:19 +0000 (17:03 -0700)]
[IrDA]: Fix the AU1000 FIR dependencies

AU1000 FIR is broken, it should depend on SOC_AU1000.

Spotted by Jean-Luc Leger.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: Fix RCU lock pairing on error path
Josh Triplett [Fri, 30 Jun 2006 00:02:31 +0000 (17:02 -0700)]
[IrDA]: Fix RCU lock pairing on error path

irlan_client_discovery_indication calls rcu_read_lock and rcu_read_unlock, but
returns without unlocking in an error case.  Fix that by replacing the return
with a goto so that the rcu_read_unlock always gets executed.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Samuel Ortiz samuel@sortiz.org <>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[XFRM]: unexport xfrm_state_mtu
Adrian Bunk [Thu, 29 Jun 2006 20:04:41 +0000 (13:04 -0700)]
[XFRM]: unexport xfrm_state_mtu

This patch removes the unused EXPORT_SYMBOL(xfrm_state_mtu).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: make skb_release_data() static
Adrian Bunk [Thu, 29 Jun 2006 20:02:35 +0000 (13:02 -0700)]
[NET]: make skb_release_data() static

skb_release_data() no longer has any users in other files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTE] ipv4: Fix typo (Bugzilla #6753)
Matt LaPlante [Thu, 29 Jun 2006 19:51:15 +0000 (12:51 -0700)]
[NETFILTE] ipv4: Fix typo (Bugzilla #6753)

This patch fixes bugzilla #6753, a typo in the netfilter Kconfig

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[IrDA]: MCS7780 usb_driver struct should be static
Adrian Bunk [Thu, 29 Jun 2006 19:39:07 +0000 (12:39 -0700)]
[IrDA]: MCS7780 usb_driver struct should be static

This patch makes a needlessly global struct static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Turn off link during shutdown
Michael Chan [Thu, 29 Jun 2006 19:38:15 +0000 (12:38 -0700)]
[BNX2]: Turn off link during shutdown

Minor change in shutdown logic to effect a link down.

Update version to 1.4.43.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Use dev_kfree_skb() instead of the _irq version
Michael Chan [Thu, 29 Jun 2006 19:37:41 +0000 (12:37 -0700)]
[BNX2]: Use dev_kfree_skb() instead of the _irq version

Change all dev_kfree_skb_irq() and dev_kfree_skb_any() to
dev_kfree_skb().  These calls are never used in irq context.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: basic sysfs support for ATM devices
Roman Kagan [Thu, 29 Jun 2006 19:36:34 +0000 (12:36 -0700)]
[ATM]: basic sysfs support for ATM devices

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [suni] change suni_init to __devinit
Chas Williams [Thu, 29 Jun 2006 19:35:49 +0000 (12:35 -0700)]
[ATM]: [suni] change suni_init to __devinit

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [iphase] should be __devinit not __init
Chas Williams [Thu, 29 Jun 2006 19:35:32 +0000 (12:35 -0700)]
[ATM]: [iphase] should be __devinit not __init

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[ATM]: [idt77105] should be __devinit not __init
Chas Williams [Thu, 29 Jun 2006 19:35:02 +0000 (12:35 -0700)]
[ATM]: [idt77105] should be __devinit not __init

Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[BNX2]: Add NETIF_F_TSO_ECN
Michael Chan [Thu, 29 Jun 2006 19:31:21 +0000 (12:31 -0700)]
[BNX2]: Add NETIF_F_TSO_ECN

Add NETIF_F_TSO_ECN feature for all bnx2 hardware.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Add ECN support for TSO
Michael Chan [Thu, 29 Jun 2006 19:30:00 +0000 (12:30 -0700)]
[NET]: Add ECN support for TSO

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu <herbert@gondor.apana.org.au>.

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AF_UNIX]: Datagram getpeersec
Catherine Zhang [Thu, 29 Jun 2006 19:27:47 +0000 (12:27 -0700)]
[AF_UNIX]: Datagram getpeersec

This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Fix logical error in skb_gso_ok
Herbert Xu [Thu, 29 Jun 2006 19:25:53 +0000 (12:25 -0700)]
[NET]: Fix logical error in skb_gso_ok

The test in skb_gso_ok is backwards.  Noticed by Michael Chan
<mchan@broadcom.com>.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
Shuya MAEDA [Wed, 28 Jun 2006 08:40:35 +0000 (01:40 -0700)]
[PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000

Signed-off-by: Shuya MAEDA <maeda-sxb@necst.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Make illegal_highdma more anal
Herbert Xu [Tue, 27 Jun 2006 20:33:10 +0000 (13:33 -0700)]
[NET]: Make illegal_highdma more anal

Rather than having illegal_highdma as a macro when HIGHMEM is off, we
can turn it into an inline function that returns zero.  This will catch
callers that give it bad arguments.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[TCP]: Export accept queue len of a TCP listening socket via rx_queue
Sridhar Samudrala [Tue, 27 Jun 2006 20:29:00 +0000 (13:29 -0700)]
[TCP]: Export accept queue len of a TCP listening socket via rx_queue

While debugging a TCP server hang issue, we noticed that currently there is
no way for a user to get the acceptq backlog value for a TCP listen socket.

All the standard networking utilities that display socket info like netstat,
ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These
fields do not mean much for listening sockets. This patch uses one of these
unused fields(rx_queue) to export the accept queue len for listening sockets.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETLINK]: Encapsulate eff_cap usage within security framework.
Darrel Goeddel [Tue, 27 Jun 2006 20:26:11 +0000 (13:26 -0700)]
[NETLINK]: Encapsulate eff_cap usage within security framework.

This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
the security framework by extending security_netlink_recv to include a required
capability parameter and converting all direct usage of eff_caps outside
of the lsm modules to use the interface.  It also updates the SELinux
implementation of the security_netlink_send and security_netlink_recv
hooks to take advantage of the sid in the netlink_skb_params struct.
This also enables SELinux to perform auditing of netlink capability checks.
Please apply, for 2.6.18 if possible.

Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NET]: Added GSO header verification
Herbert Xu [Tue, 27 Jun 2006 20:22:38 +0000 (13:22 -0700)]
[NET]: Added GSO header verification

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[NETFILTER]: statistic match: add missing Kconfig help text
Patrick McHardy [Tue, 27 Jun 2006 10:02:14 +0000 (03:02 -0700)]
[NETFILTER]: statistic match: add missing Kconfig help text

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>