David S. Miller [Fri, 13 Mar 2015 02:58:27 +0000 (22:58 -0400)]
Merge branch 'listener_refactor'
Eric Dumazet says:
====================
inet: tcp listener refactoring, part 8
These patches prepare request socks being hashed into general ehash
table : We declare 3 aliases (ireq_state, ireq_refcnt, ireq_family)
Note that refcnt is not yet handled, this will be done later.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:10 +0000 (16:44 -0700)]
inet: introduce ireq_family
Before inserting request socks into general hash table,
fill their socket family.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:09 +0000 (16:44 -0700)]
inet: get_openreq4() & get_openreq6() do not need listener
ireq->ir_num contains local port, use it.
Also, get_openreq4() dumping listen_sk->refcnt makes litle sense.
inet_diag_fill_req() can also use ireq->ir_num
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:08 +0000 (16:44 -0700)]
inet: prepare sock_edemux() & sock_gen_put() for new SYN_RECV state
sock_edemux() & sock_gen_put() should be ready to cope with request socks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:07 +0000 (16:44 -0700)]
net: add req_prot_cleanup() & req_prot_init() helpers
Make proto_register() & proto_unregister() a bit nicer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:06 +0000 (16:44 -0700)]
inet: add rsk_refcnt/ireq_refcnt to request socks
When request socks will be in ehash, they'll need to be refcounted.
This patch adds rsk_refcnt/ireq_refcnt macros, and adds
reqsk_put() function, but nothing yet use them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:05 +0000 (16:44 -0700)]
inet: add ireq_state field to inet_request_sock
We need to identify request sock when they'll be visible in
global ehash table.
ireq_state is an alias to req.__req_common.skc_state.
Its value is set to TCP_NEW_SYN_RECV
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:04 +0000 (16:44 -0700)]
inet: add TCP_NEW_SYN_RECV state
TCP_SYN_RECV state is currently used by fast open sockets.
Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.
This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.
This state is not exported to user space.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:03 +0000 (16:44 -0700)]
ipv6: add missing ireq_net & ir_cookie initializations
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie
Lets clear ireq->ir_cookie in inet_reqsk_alloc()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe2f ("net: add real socket cookies")
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Mar 2015 19:03:12 +0000 (20:03 +0100)]
cls_bpf: do eBPF invocation under non-bh RCU lock variant for maps
Currently, it is possible in cls_bpf to access eBPF maps only under
rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(),
the classifier would be called from __netif_receive_skb_core() under
rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via
__dev_queue_xmit().
This rcu/rcu_bh mix doesn't work together with eBPF maps as they require
soley to be called under rcu_read_lock(). eBPF maps could also be shared
among various other eBPF programs (possibly even with other eBPF program
types, f.e. tracing) and user space processes, so any context is assumed.
Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program
invocation under non-bh RCU lock variant.
Fixes: e2e9b6541dd4 ("cls_bpf: add initial eBPF support for programmable classifiers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Mar 2015 22:26:58 +0000 (18:26 -0400)]
Merge branch 'fib_trie_table_merge_fixes'
Alexander Duyck says:
====================
fib_trie: Minor fixes for table merge
This patch set addresses two issues reported with the tables merged, the
first is a NULL pointer dereference, and the other is to remove a WARN_ON
and set the ordering for aliases from different tables with the same slen
values.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 12 Mar 2015 21:46:29 +0000 (14:46 -0700)]
fib_trie: Provide a deterministic order for fib_alias w/ tables merged
This change makes it so that we should always have a deterministic ordering
for the main and local aliases within the merged table when two leaves
overlap.
So for example if we have a leaf with a key of 192.168.254.0. If we
previously added two aliases with a prefix length of 24 from both local and
main the first entry would be first and the second would be second. When I
was coding this I had added a WARN_ON should such a situation occur as I
wasn't sure how likely it would be. However this WARN_ON has been
triggered so this is something that should be addressed.
With this patch the ordering of the aliases is as follows. First they are
sorted on prefix length, then on their table ID, then tos, and finally
priority. This way what we end up doing is essentially interleaving the
two tables on what used to be leaf_info structure boundaries.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 12 Mar 2015 21:46:23 +0000 (14:46 -0700)]
fib_trie: Avoid NULL pointer if local table is not allocated
The function fib_unmerge assumed the local table had already been
allocated. If that is not the case however when custom rules are applied
then this can result in a NULL pointer dereference.
In order to prevent this we must check the value of the local table pointer
and if it is NULL simply return 0 as there is no local table to separate
from the main.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Madhu Challa <challa@noironetworks.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Mar 2015 16:21:42 +0000 (17:21 +0100)]
ebpf: verifier: check that call reg with ARG_ANYTHING is initialized
I noticed that a helper function with argument type ARG_ANYTHING does
not need to have an initialized value (register).
This can worst case lead to unintented stack memory leakage in future
helper functions if they are not carefully designed, or unintended
application behaviour in case the application developer was not careful
enough to match a correct helper function signature in the API.
The underlying issue is that ARG_ANYTHING should actually be split
into two different semantics:
1) ARG_DONTCARE for function arguments that the helper function
does not care about (in other words: the default for unused
function arguments), and
2) ARG_ANYTHING that is an argument actually being used by a
helper function and *guaranteed* to be an initialized register.
The current risk is low: ARG_ANYTHING is only used for the 'flags'
argument (r4) in bpf_map_update_elem() that internally does strict
checking.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Mar 2015 18:39:44 +0000 (14:39 -0400)]
Merge branch 'possible_net_t'
Eric W. Biederman says:
====================
Introduce possible_net_t
The current usage of write_pnet and read_pnet is a little laborious and
error prone as you only notice if you failed to include them if are
compiling with network namespaces enabled.
possible_net_t remedies that by using a type that is 0 bytes when
network namespaces are disabled and can only be read and written to with
read_pnet and write_pnet.
Aka less work and safer for the same effect.
I kill hold_net and release_net first as are they are haven't been used
since 2008 and are noise at the points where write_pnet and read_pnet
are used.
I have folded in Eric Dumazets suggestions to improve the killing of
hold_net and release net. And respon. I had to respin anyway as
there was enough changes elsewhere in the tree the previous version
of these patches did not quite apply cleanly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 12 Mar 2015 04:06:44 +0000 (23:06 -0500)]
net: Introduce possible_net_t
Having to say
> #ifdef CONFIG_NET_NS
> struct net *net;
> #endif
in structures is a little bit wordy and a little bit error prone.
Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
> struct net *net;
> #endif
> } possible_net_t;
And then in a header say:
> possible_net_t net;
Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.
Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.
This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Thu, 12 Mar 2015 04:04:08 +0000 (23:04 -0500)]
net: Kill hold_net release_net
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008. Kill the code it is long past due.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Mar 2015 18:35:35 +0000 (14:35 -0400)]
Merge branch 'rhashtable-cleanups'
Herbert Xu says:
====================
rhashtable hash cleanups
This is a rebase on top of the nested lock annotation fix.
Nothing to see here, just a bunch of simple clean-ups before
I move onto something more substantial (hopefully).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 03:49:41 +0000 (14:49 +1100)]
rhashtable: Remove obj_raw_hashfn
Now that the only caller of obj_raw_hashfn is head_hashfn, we can
simply kill it and fold it into the latter.
This patch also moves the common shift from head_hashfn/key_hashfn
into rht_bucket_index.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 03:49:40 +0000 (14:49 +1100)]
rhashtable: Remove key length argument to key_hashfn
key_hashfn has only one caller and it doesn't really need to supply
the key length as an extra parameter.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 03:49:39 +0000 (14:49 +1100)]
rhashtable: Use head_hashfn instead of obj_raw_hashfn
Now that we don't have cross-table hashes, we no longer need to
keep the entire hash value so all users of obj_raw_hashfn can
use head_hashfn instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 03:49:38 +0000 (14:49 +1100)]
rhashtable: Move masking back into key_hashfn
This patch reverts commit
c88455ce50ae4224d84960ce2baa53e61580df27
("rhashtable: key_hashfn() must return full hash value") because
the only user of it always masks the hash value.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Wed, 11 Mar 2015 16:56:25 +0000 (17:56 +0100)]
net/mlx5_core: don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
type T;
identifier f;
@@
static T f (...) { ... }
@@
identifier r.f;
declarer name EXPORT_SYMBOL;
@@
-EXPORT_SYMBOL(f);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 03:47:13 +0000 (14:47 +1100)]
rhashtable: Add annotation to nested lock
Commit
aa34a6cb0478842452bac58edb50d3ef9e178c92 ("rhashtable:
Add arbitrary rehash function") killed the annotation on the
nested lock which leads to bitching from lockdep.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 03:27:52 +0000 (20:27 -0700)]
net: fix CONFIG_NET_NS=n compilation
I forgot to use write_pnet() in three locations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 33cf7c90fe2f9 ("net: add real socket cookies")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lubomir Rintel [Wed, 11 Mar 2015 14:39:21 +0000 (15:39 +0100)]
ipv6: expose RFC4191 route preference via rtnetlink
This makes it possible to retain the route preference when RAs are handled in
userspace.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 12 Mar 2015 01:42:50 +0000 (10:42 +0900)]
switchdev: correct spelling of notifier in comments
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 12 Mar 2015 02:00:10 +0000 (11:00 +0900)]
vxlan: Correct path typo in comment
Flags are used in the return path rather than the return patch.
Fixes: af33c1adae1e ("vxlan: Eliminate dependency on UDP socket in transmit path")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 01:53:14 +0000 (18:53 -0700)]
net: add real socket cookies
A long standing problem in netlink socket dumps is the use
of kernel socket addresses as cookies.
1) It is a security concern.
2) Sockets can be reused quite quickly, so there is
no guarantee a cookie is used once and identify
a flow.
3) request sock, establish sock, and timewait socks
for a given flow have different cookies.
Part of our effort to bring better TCP statistics requires
to switch to a different allocator.
In this patch, I chose to use a per network namespace 64bit generator,
and to use it only in the case a socket needs to be dumped to netlink.
(This might be refined later if needed)
Note that I tried to carry cookies from request sock, to establish sock,
then timewait sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 11 Mar 2015 23:36:08 +0000 (16:36 -0700)]
fib_trie: Only display main table in /proc/net/route
When we merged the tries for local and main I had overlooked the iterator
for /proc/net/route. As a result it was outputting both local and main
when the two tries were merged.
This patch resolves that by only providing output for aliases that are
actually in the main trie. As a result we should go back to the original
behavior which I assume will be necessary to maintain legacy support.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Mar 2015 21:56:39 +0000 (17:56 -0400)]
Merge branch 'dsa_phy_divert'
Florian Fainelli says:
====================
net: dsa: support PHY reads/writes diversion
This patch series completes the PHY reads/writes diversion when we need to use
the slave MII bus provided by DSA and the underlying switch drivers to
implement the real PHY reads and writes. This is particularly useful when they
are conflicting MDIO bus addresses as in the case of multiple Broadcom switches
connected to each other (internal and external, or just external).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 10 Mar 2015 23:57:13 +0000 (16:57 -0700)]
net: dsa: fully divert PHY reads/writes if requested
In case a PHY is found via Device Tree, and is also flagged by the
switch driver as needing indirect reads/writes using the switch driver
implemented MDIO bus, make sure that we bind this PHY to the slave MII
bus in order for this to happen.
Without this, we would succeed in having the PHY driver probe()'s
function to use slave MII bus read/write functions, because this is done
during dsa_slave_mii_init(), but past that point, the PHY driver would
not go through these diverted reads and writes.
Fixes: 0d8bcdd383b88 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 10 Mar 2015 23:57:12 +0000 (16:57 -0700)]
net: dsa: move PHY setup on DSA MII bus to its own function
In preparation for dealing with indirect reads and writes towards
certain PHY devices, move the code which deals with binding the PHY
device to the slave MII bus created by DSA to its own function:
dsa_slave_phy_connect().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 10 Mar 2015 23:57:11 +0000 (16:57 -0700)]
of: mdio: export of_mdio_parse_addr
Export of_mdio_parse_addr() which allows parsing a given Ethernet PHY
node MDIO address, verify it is within the allowed range, and return
its value. This is going to be useful for the DSA code which needs to
deal with multiple layers of MDIO buses.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Tue, 10 Mar 2015 22:55:00 +0000 (15:55 -0700)]
net: bcmgenet: collect Rx discarded packet count
Bits 31:16 of RDMA_PROD_INDEX contain Rx discarded packet count, which
are the Rx packets that had to be dropped by MAC hardware since there
was no room on the Rx queue. Add code to collect this information into
the netdev stats.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 11 Mar 2015 21:02:16 +0000 (14:02 -0700)]
fib_trie: Fix uninitialized variable warning
The 0-day kernel test infrastructure reported a use of uninitialized
variable warning for local_table due to the fact that the local and main
allocations had been swapped from the original setup. This change corrects
that by making it so that we free the main table if the local table
allocation fails.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Tue, 10 Mar 2015 20:03:54 +0000 (21:03 +0100)]
fib_trie: call fib_table_flush_external under RTNL
Move rtnl_lock() before the call to fib4_rules_exit so that
fib_table_flush_external is called under RTNL.
Fixes: 104616e74e0b ("switchdev: don't support custom ip rules, for now")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Shearman [Tue, 10 Mar 2015 16:37:59 +0000 (16:37 +0000)]
mpls: Allow mpls_gso and mpls_router to be built as modules
CONFIG_MPLS=m doesn't result in a kernel module being built because it
applies to the net/mpls directory, rather than to .o files.
So revert the MPLS menuitem to being a boolean and make MPLS_GSO and
MPLS_ROUTING tristates to allow mpls_gso and mpls_router modules to be
produced as desired.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaohui Xie [Tue, 10 Mar 2015 10:31:12 +0000 (18:31 +0800)]
net/fsl: remove dependency FSL_SOC from MDIO
FSL_PQ_MDIO and FSL_XGMAC_MDIO are not really depend on FSL_SOC, they
can build on non-PPC platforms.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 10 Mar 2015 22:43:48 +0000 (09:43 +1100)]
rhashtable: Add arbitrary rehash function
This patch adds a rehash function that supports the use of any
hash function for the new table. This is needed to support changing
the random seed value during the lifetime of the hash table.
However for now the random seed value is still constant and the
rehash function is simply used to replace the existing expand/shrink
functions.
[ ASSERT_BUCKET_LOCK() and thus debug_dump_table() +
debug_dump_buckets() are not longer used, so delete them
entirely. -DaveM ]
Signed-off-by: Herbert Xu <herbert.xu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 9 Mar 2015 22:27:55 +0000 (09:27 +1100)]
rhashtable: Move hash_rnd into bucket_table
Currently hash_rnd is a parameter that users can set. However,
no existing users set this parameter. It is also something that
people are unlikely to want to set directly since it's just a
random number.
In preparation for allowing the reseeding/rehashing of rhashtable,
this patch moves hash_rnd into bucket_table so that it's now an
internal state rather than a parameter.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 6 Mar 2015 21:47:00 +0000 (13:47 -0800)]
ipv4: FIB Local/MAIN table collapse
This patch is meant to collapse local and main into one by converting
tb_data from an array to a pointer. Doing this allows us to point the
local table into the main while maintaining the same variables in the
table.
As such the tb_data was converted from an array to a pointer, and a new
array called data is added in order to still provide an object for tb_data
to point to.
In order to track the origin of the fib aliases a tb_id value was added in
a hole that existed on 64b systems. Using this we can also reverse the
merge in the event that custom FIB rules are enabled.
With this patch I am seeing an improvement of 20ns to 30ns for routing
lookups as long as custom rules are not enabled, with custom rules enabled
we fall back to split tables and the original behavior.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 10 Mar 2015 16:23:34 +0000 (12:23 -0400)]
tipc: ensure that idle links are deleted when a bearer is disabled
commit
afaa3f65f65fda2e7b190aac7e2a75d9a2a77cb6
(tipc: purge links when bearer is disabled) was an attempt to resolve
a problem that turned out to have a more profound reason.
When we disable a bearer, we delete all its pertaining links if
there is no other bearer to perform failover to, or if the module
is shutting down. In case there are dual bearers, we wait with
deleting links until the failover procedure is finished.
However, this misses the case when a link on the removed bearer
was already down, so that there will be no failover procedure to
finish the link delete. This causes confusion if a new bearer is
added to replace the removed one, and also entails a small memory
leak.
This commit takes the current state of the link into account when
deciding when to delete it, and also reverses the above-mentioned
commit.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 10 Mar 2015 21:39:34 +0000 (14:39 -0700)]
fib_trie: Address possible NULL pointer dereference in resize
If the inflate call failed it would return NULL. As a result tp would be
set to NULL and cause use to trigger a NULL pointer dereference in
should_halve if the inflate failed on the first attempt.
In order to prevent this we should decrement max_work before we actually
attempt to inflate as this will force us to exit before attempting to halve
a node we should have inflated. In order to keep things symmetric between
inflate and halve I went ahead and also moved the decrement of max_work for
the halve case as well so we take care of that before we actually attempt
to halve the tnode.
Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Tue, 10 Mar 2015 22:33:49 +0000 (18:33 -0400)]
macb: Fix merge error.
The code removed by commit
421d9df0628b ("net/macb: merge
at91_ether driver into macb driver") should be removed
in the merge resolution as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 10 Mar 2015 18:25:41 +0000 (11:25 -0700)]
fib_trie: Correctly handle case of key == 0 in leaf_walk_rcu
In the case of a trie that had no tnodes with a key of 0 the initial
look-up would fail resulting in an out-of-bounds cindex on the first tnode.
This resulted in an entire trie being skipped.
In order resolve this I have updated the cindex logic in the initial
look-up so that if the key is zero we will always traverse the child zero
path.
Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Mar 2015 17:45:33 +0000 (13:45 -0400)]
Merge branch 'inet_diag_cleanups'
Eric Dumazet says:
====================
inet_diag: cleanups and improvements
Before changing way request socks are dumped, let's clean up this code
a bit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Mar 2015 14:15:54 +0000 (07:15 -0700)]
inet_diag: add const to inet_diag_req_v2
diag dumpers should not modify the request.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Mar 2015 14:15:53 +0000 (07:15 -0700)]
inet_diag: cleanups
Remove all inline keywords, add some const, and cleanup style.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 10 Mar 2015 14:15:52 +0000 (07:15 -0700)]
net: constify sock_diag_check_cookie()
sock_diag_check_cookie() second parameter is constant
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 10 Mar 2015 03:56:53 +0000 (04:56 +0100)]
drivers: atm: nicstar: remove ifdef'd out skb destructors
remove dead code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Mar 2015 16:48:47 +0000 (12:48 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter fixes for net-next
The following batch contains a couple of fixes to address some fallout
from the previous pull request, they are:
1) Address link problems in the bridge code after
e5de75b. Fix it by
using rcu hook to address to avoid ifdef pollution and hard
dependency between bridge and br_netfilter.
2) Address sparse warnings in the netfilter reject code, patch from
Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Tue, 10 Mar 2015 09:27:18 +0000 (10:27 +0100)]
netfilter: bridge: use rcu hook to resolve br_netfilter dependency
e5de75b ("netfilter: bridge: move DNAT helper to br_netfilter") results
in the following link problem:
net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge`
Moreover it creates a hard dependency between br_netfilter and the
bridge core, which is what we've been trying to avoid so far.
Resolve this problem by using a hook structure so we reduce #ifdef
pollution and keep bridge netfilter specific code under br_netfilter.c
which was the original intention.
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 9 Mar 2015 22:04:15 +0000 (23:04 +0100)]
netfilter: fix sparse warnings in reject handling
make C=1 CF=-D__CHECK_ENDIAN__ shows following:
net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types)
net/bridge/netfilter/nft_reject_bridge.c:65:50: expected restricted __be16 [usertype] protocol [..]
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..]
Caused by two (harmless) errors:
1. htons() instead of ntohs()
2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Fainelli [Mon, 9 Mar 2015 22:44:13 +0000 (15:44 -0700)]
net: phy: bcm7xxx: add alternate id for 7439
BCM7439 has an alternate PHY OUI: 0xae025080 which is to be found in
some variants of this chip.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Mon, 9 Mar 2015 20:59:09 +0000 (13:59 -0700)]
switchdev: add netlink flags to IPv4 FIB add op
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Mar 2015 03:50:26 +0000 (23:50 -0400)]
Merge branch 'dsa-next'
Florian Fainelli says:
====================
net: dsa: remove restriction on platform_device
This patch series removes the restriction in DSA to operate exclusively with
platform_device Ethernet MAC drivers when using Device Tree. This basically
allows arbitrary Ethernet MAC drivers to be used now in conjunction with
Device Tree.
The reason was that DSA was using a of_find_device_by_node() which limits
the device_node to device pointer search exclusively to platform_device,
in our case, we are interested in doing a "class" research and lookup the
net_device.
Thanks to Chris Packham for testing this on his platform.
Changes in v2:
- fix build for !CONFIG_OF_NET
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 9 Mar 2015 21:31:21 +0000 (14:31 -0700)]
net: dsa: utilize of_find_net_device_by_node
Using of_find_device_by_node() restricts the search to platform_device that
match the specified device_node pointer. This is not even remotely true for
network devices backed by a pci_device for instance.
of_find_net_device_by_node() allows us to do a more thorough lookup to find the
struct net_device corresponding to a particular device_node pointer.
For symetry with the non-OF code path, we hold the net_device pointer in
dsa_probe() just like what dev_to_net_dev() does when we call this
function.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 9 Mar 2015 21:31:20 +0000 (14:31 -0700)]
net: core: add of_find_net_device_by_node()
Add a helper function which allows getting the struct net_device pointer
associated with a given struct device_node pointer. This is useful for
instance for DSA Ethernet devices not backed by a platform_device, but a PCI
device.
Since we need to access net_class which is not accessible outside of
net/core/net-sysfs.c, this helper function is also added here and gated
with CONFIG_OF_NET.
Network devices initialized with SET_NETDEV_DEV() are also taken into
account by checking for dev->parent first and then falling back to
checking the device pointer within struct net_device.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Mar 2015 03:38:02 +0000 (23:38 -0400)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/cadence/macb.c
Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Mon, 9 Mar 2015 20:40:00 +0000 (13:40 -0700)]
net: bcmgenet: core changes for supporting multiple Rx queues
1. Add struct bcmgenet_rx_ring to hold all necessary information
for a single Rx queue.
2. Add bcmgenet_init_rx_queues() to initialize all Rx queues.
3. Modify bcmgenet_init_rx_ring() to initialize a single Rx queue.
4. Modify Rx interrupt path code to use per-queue data.
5. Modify bcmgenet_rx_refill() to use RxCB->bd_addr.
Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 10 Mar 2015 01:59:50 +0000 (18:59 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm/s390 bugfixes from Marcelo Tosatti.
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: non-LPAR case obsolete during facilities mask init
KVM: s390: include guest facilities in kvm facility test
KVM: s390: fix in memory copy of facility lists
KVM: s390/cpacf: Fix kernel bug under z/VM
KVM: s390/cpacf: Enable key wrapping by default
Linus Torvalds [Tue, 10 Mar 2015 01:55:52 +0000 (18:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"One performance optimization for page_clear and a couple of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix incorrect ASCE after crst_table_downgrade
s390/ftrace: fix crashes when switching tracers / add notrace to cpu_relax()
s390/pci: unify pci_iomap symbol exports
s390/pci: fix [un]map_resources sequence
s390: let the compiler do page clearing
s390/pci: fix possible information leak in mmio syscall
s390/dcss: array index 'i' is used before limits check.
s390/scm_block: fix off by one during cluster reservation
s390/jump label: improve and fix sanity check
s390/jump label: add missing jump_label_apply_nops() call
Linus Torvalds [Tue, 10 Mar 2015 01:44:06 +0000 (18:44 -0700)]
Merge tag 'trace-fixes-v4.0-rc2-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull seq-buf/ftrace fixes from Steven Rostedt:
"This includes fixes for seq_buf_bprintf() truncation issue. It also
contains fixes to ftrace when /proc/sys/kernel/ftrace_enabled and
function tracing are started. Doing the following causes some issues:
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
As well as with function tracing too. Pratyush Anand first reported
this issue to me and supplied a patch. When I tested this on my x86
test box, it caused thousands of backtraces and warnings to appear in
dmesg, which also caused a denial of service (a warning for every
function that was listed). I applied Pratyush's patch but it did not
fix the issue for me. I looked into it and found a slight problem
with trampoline accounting. I fixed it and sent Pratyush a patch, but
he said that it did not fix the issue for him.
I later learned tha Pratyush was using an ARM64 server, and when I
tested on my ARM board, I was able to reproduce the same issue as
Pratyush. After applying his patch, it fixed the problem. The above
test uncovered two different bugs, one in x86 and one in ARM and
ARM64. As this looked like it would affect PowerPC, I tested it on my
PPC64 box. It too broke, but neither the patch that fixed ARM or x86
fixed this box (the changes were all in generic code!). The above
test, uncovered two more bugs that affected PowerPC. Again, the
changes were only done to generic code. It's the way the arch code
expected things to be done that was different between the archs. Some
where more sensitive than others.
The rest of this series fixes the PPC bugs as well"
* tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled
ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl
ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl
seq_buf: Fix seq_buf_bprintf() truncation
seq_buf: Fix seq_buf_vprintf() truncation
Linus Torvalds [Tue, 10 Mar 2015 01:17:21 +0000 (18:17 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) nft_compat accidently truncates ethernet protocol to 8-bits, from
Arturo Borrero.
2) Memory leak in ip_vs_proc_conn(), from Julian Anastasov.
3) Don't allow the space required for nftables rules to exceed the
maximum value representable in the dlen field. From Patrick
McHardy.
4) bcm63xx_enet can accidently leave interrupts permanently disabled
due to errors in the NAPI polling exit logic. Fix from Nicolas
Schichan.
5) Fix OOPSes triggerable by the ping protocol module, due to missing
address family validations etc. From Lorenzo Colitti.
6) Don't use RCU locking in sleepable context in team driver, from Jiri
Pirko.
7) xen-netback miscalculates statistic offset pointers when reporting
the stats to userspace. From David Vrabel.
8) Fix a leak of up to 256 pages per VIF destroy in xen-netaback, also
from David Vrabel.
9) ip_check_defrag() cannot assume that skb_network_offset(),
particularly when it is used by the AF_PACKET fanout defrag code.
From Alexander Drozdov.
10) gianfar driver doesn't query OF node names properly when trying to
determine the number of hw queues available. Fix it to explicitly
check for OF nodes named queue-group. From Tobias Waldekranz.
11) MID field in macb driver should be 12 bits, not 16. From Punnaiah
Choudary Kalluri.
12) Fix unintentional regression in traceroute due to timestamp socket
option changes. Empty ICMP payloads should be allowed in
non-timestamp cases. From Willem de Bruijn.
13) When devices are unregistered, we have to get rid of AF_PACKET
multicast list entries that point to it via ifindex. Fix from
Francesco Ruggeri.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
tipc: fix bug in link failover handling
net: delete stale packet_mclist entries
net: macb: constify macb configuration data
MAINTAINERS: add Marc Kleine-Budde as co maintainer for CAN networking layer
MAINTAINERS: linux-can moved to github
can: kvaser_usb: Read all messages in a bulk-in URB buffer
can: kvaser_usb: Avoid double free on URB submission failures
can: peak_usb: fix missing ctrlmode_ init for every dev
can: add missing initialisations in CAN related skbuffs
ip: fix error queue empty skb handling
bgmac: Clean warning messages
tcp: align tcp_xmit_size_goal() on tcp_tso_autosize()
net: fec: fix unbalanced clk disable on driver unbind
net: macb: Correct the MID field length value
net: gianfar: correctly determine the number of queue groups
ipv4: ip_check_defrag should not assume that skb_network_offset is zero
net: bcmgenet: properly disable password matching
net: eth: xgene: fix booting with devicetree
bnx2x: Force fundamental reset for EEH recovery
xen-netback: refactor xenvif_handle_frag_list()
...
Linus Torvalds [Tue, 10 Mar 2015 01:06:13 +0000 (18:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Miscellaneous driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: psmouse - disable "palm detection" in the focaltech driver
Input: psmouse - disable changing resolution/rate/scale for FocalTech
Input: psmouse - ensure that focaltech reports consistent coordinates
Input: psmouse - remove hardcoded touchpad size from the focaltech driver
Input: tc3589x-keypad - set IRQF_ONESHOT flag to ensure IRQ request
Input: ALPS - fix memory leak when detection fails
Input: sun4i-ts - add thermal driver dependency
Input: cyapa - remove superfluous type check in cyapa_gen5_read_idac_data()
Input: cyapa - fix unaligned functions redefinition error
Input: mma8450 - add parent device
Linus Torvalds [Tue, 10 Mar 2015 01:00:25 +0000 (18:00 -0700)]
Merge tag 'regulator-v4.0-rc2' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of driver specific fixes plus a fix for a regression in the
core where the updates to use sysfs group registration were overly
enthusiastic in eliding properties and removed some that had been
previously present"
* tag 'regulator-v4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix regression due to NULL constraints check
regulator: rk808: Set the enable time for LDOs
regulator: da9210: Mask all interrupt sources to deassert interrupt line
Linus Torvalds [Tue, 10 Mar 2015 00:50:02 +0000 (17:50 -0700)]
Merge tag 'spi-v4.0-rc2' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A collection of driver specific fixes to which the usual comments
about them being important if you see them mostly apply (except for
the comment fix). The pl022 one is particularly nasty for anyone
affected by it"
* tag 'spi-v4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pl022: Fix race in giveback() leading to driver lock-up
spi: dw-mid: avoid potential NULL dereference
spi: img-spfi: Verify max spfi transfer length
spi: fix a typo in comment.
spi: atmel: Fix interrupt setup for PDC transfers
spi: dw: revisit FIFO size detection again
spi: dw-pci: correct number of chip selects
drivers: spi: ti-qspi: wait for busy bit clear before data write/read
Linus Torvalds [Tue, 10 Mar 2015 00:45:34 +0000 (17:45 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull tpm fixes from James Morris:
"fixes for the TPM driver"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
tpm: fix call order in tpm-chip.c
tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send
Linus Torvalds [Tue, 10 Mar 2015 00:35:29 +0000 (17:35 -0700)]
Merge tag 'fbdev-fixes-4.0' of git://git./linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- Fix regression in with omapdss when using i2c displays
- Fix possible null deref in fbmon
- Check kalloc return value in AMBA CLCD
* tag 'fbdev-fixes-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
OMAPDSS: fix regression with display sysfs files
video: fbdev: fix possible null dereference
video: ARM CLCD: Add missing error check for devm_kzalloc
Linus Torvalds [Tue, 10 Mar 2015 00:30:09 +0000 (17:30 -0700)]
Merge branch 'for-4.0-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"The cgroup iteration update two years ago and the recent cpuset
restructuring introduced regressions in subset of cpuset
configurations. Three patches to fix them.
All are marked for -stable"
* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix cpuset sched_relax_domain_level
cpuset: fix a warning when clearing configured masks in old hierarchy
cpuset: initialize effective masks when clone_children is enabled
Linus Torvalds [Tue, 10 Mar 2015 00:23:30 +0000 (17:23 -0700)]
Merge branch 'for-4.0-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixlet from Tejun Heo:
"Speed limiting fix for sata_fsl"
* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
sata-fsl: Apply link speed limits
Linus Torvalds [Tue, 10 Mar 2015 00:00:54 +0000 (17:00 -0700)]
Merge branch 'for-4.0-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"One fix patch for a subtle livelock condition which can happen on
PREEMPT_NONE kernels involving two racing cancel_work calls. Whoever
comes in the second has to wait for the previous one to finish. This
was implemented by making the later one block for the same condition
that the former would be (work item completion) and then loop and
retest; unfortunately, depending on the wake up order, the later one
could lock out the former one to finish by busy looping on the cpu.
This is fixed by implementing explicit wait mechanism. Work item
might not belong anywhere at this point and there's remote possibility
of thundering herd problem. I originally tried to use bit_waitqueue
but it didn't work for static work items on modules. It's currently
using single wait queue with filtering wake up function and exclusive
wakeup. If this ever becomes a problem, which is not very likely, we
can try to figure out a way to piggy back on bit_waitqueue"
* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
Jon Paul Maloy [Mon, 9 Mar 2015 20:16:22 +0000 (16:16 -0400)]
tipc: fix bug in link failover handling
In commit
c637c1035534867b85b78b453c38c495b58e2c5a
("tipc: resolve race problem at unicast message reception") we
introduced a new mechanism for delivering buffers upwards from link
to socket layer.
That code contains a bug in how we handle the new link input queue
during failover. When a link is reset, some of its users may be blocked
because of congestion, and in order to resolve this, we add any pending
wakeup pseudo messages to the link's input queue, and deliver them to
the socket. This misses the case where the other, remaining link also
may have congested users. Currently, the owner node's reference to the
remaining link's input queue is unconditionally overwritten by the
reset link's input queue. This has the effect that wakeup events from
the remaining link may be unduely delayed (but not lost) for a
potentially long period.
We fix this by adding the pending events from the reset link to the
input queue that is currently referenced by the node, whichever one
it is.
This commit should be applied to both net and net-next.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Ruggeri [Mon, 9 Mar 2015 18:51:04 +0000 (11:51 -0700)]
net: delete stale packet_mclist entries
When an interface is deleted from a net namespace the ifindex in the
corresponding entries in PF_PACKET sockets' mclists becomes stale.
This can create inconsistencies if later an interface with the same ifindex
is moved from a different namespace (not that unlikely since ifindexes are
per-namespace).
In particular we saw problems with dev->promiscuity, resulting
in "promiscuity touches roof, set promiscuity failed. promiscuity
feature of device might be broken" warnings and EOVERFLOW failures of
setsockopt(PACKET_ADD_MEMBERSHIP).
This patch deletes the mclist entries for interfaces that are deleted.
Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with
EADDRNOTAVAIL if called after the interface is deleted, also make
packet_mc_drop not fail.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 9 Mar 2015 18:44:13 +0000 (14:44 -0400)]
tipc: Add Ying Xue to TIPC maintainers list
We remove Allan Stephens, who has moved on to other tasks, from
the TIPC maintainers list. He is replaced by Ying Xue, who has
been doing the maintenance on behalf of WindRiver since almost
three years.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Mon, 9 Mar 2015 18:14:37 +0000 (13:14 -0500)]
net: Remove protocol from struct dst_ops
After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Mar 2015 20:04:53 +0000 (16:04 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-03-09
This series contains updates to i40e and i40evf.
Greg cleans up some "hello world" strings which were left around from
early development.
Shannon modifies the drive to make sure the sizeof() calls are taking
the size of the actual struct that we care about. Also updates the
NVMUpdate read/write so that it is less noisy when logging. This was
because the NVMUpdate tool does not necessarily know the ReadOnly map of
the current NVM image, and must try reading and writing words that may be
protected. This generates an error out of the Firmware request that the
driver logs. Unfortunately, this ended up spitting out hundreds of
bogus read/write error messages. If a user wants the noisy logging,
the change can be overridden by enabling the NVM update debugging.
Mitch fixes a possible deadlock issue where if a reset occurred when the
netdev is closed, the reset task will hang in napi_disable. Added
ethtool RSS support as suggested by Ben Hutchings.
Jesse fixes a bug introduced in the force writeback code, where the
interrupt rate was set to 0 (maximum) by accident. The driver must
correctly set the NOITR fields to avoid IT update as a side effect
of triggering the software interrupt.
I provided a simple cleanup to make the use of PF/VF consistent, which
was reported by Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Mar 2015 19:58:21 +0000 (15:58 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. Basically, improvements for the packet rejection infrastructure,
deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for
br_netfilter. More specifically they are:
1) Send packet to reset flow if checksum is valid, from Florian Westphal.
2) Fix nf_tables reject bridge from the input chain, also from Florian.
3) Deprecate the CLUSTERIP target, the cluster match supersedes it in
functionality and it's known to have problems.
4) A couple of cleanups for nf_tables rule tracing infrastructure, from
Patrick McHardy.
5) Another cleanup to place transaction declarations at the bottom of
nf_tables.h, also from Patrick.
6) Consolidate Kconfig dependencies wrt. NF_TABLES.
7) Limit table names to 32 bytes in nf_tables.
8) mac header copying in bridge netfilter is already required when
calling ip_fragment(), from Florian Westphal.
9) move nf_bridge_update_protocol() to br_netfilter.c, also from
Florian.
10) Small refactor in br_netfilter in the transmission path, again from
Florian.
11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Cartwright [Mon, 9 Mar 2015 16:14:39 +0000 (11:14 -0500)]
net: macb: constify macb configuration data
The configurations are not modified by the driver. Make them 'const' so
that they may be placed in a read-only section.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Mon, 9 Mar 2015 11:54:48 +0000 (12:54 +0100)]
mpls: Spelling: s/conceved/conceived/, s/as/a/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Mon, 9 Mar 2015 09:43:42 +0000 (10:43 +0100)]
tipc: fix inconsistent signal handling regression
Commit
9bbb4ecc6819 ("tipc: standardize recvmsg routine") changed
the sleep/wakeup behaviour for sockets entering recv() or accept().
In this process the order of reporting -EAGAIN/-EINTR was reversed.
This caused problems with wrong errno being reported back if the
timeout expires. The same problem happens if the socket is
nonblocking and recv()/accept() is called when the process have
pending signals. If there is no pending data read or connections to
accept, -EINTR will be returned instead of -EAGAIN.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reported-by László Benedek <laszlo.benedek@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Mar 2015 19:41:00 +0000 (15:41 -0400)]
Merge tag 'linux-can-fixes-for-4.0-
20150309' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2015-03-09
this is a pull request for net/master for the 4.0 release cycle, it consists of
6 patches:
A patch by Oliver Hartkopp fixes a long outstanding bug in the infrastructure,
which leads to skb_under_panics when CAN interfaces are used by AF_PACKET
sockets e.g. by dhclient. Stephane Grosjean contributes a patch for the
peak_usb driver which adds a missing initialization. Two patches by Ahmed S.
Darwish fix problems in the kvaser_usb driver. Followed by two patches by
myself, updating the MAINTAINERS file
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 9 Mar 2015 19:38:35 +0000 (15:38 -0400)]
Merge tag 'iwlwifi-next-for-kalle-2015-03-07' of https://git./linux/kernel/git/iwlwifi/iwlwifi-next
fix compilation when DEBUGFS isn't set
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 9 Mar 2015 09:26:24 +0000 (10:26 +0100)]
switchdev: use gpl variant of symbol export
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Mon, 9 Mar 2015 09:19:31 +0000 (10:19 +0100)]
tipc: sparse: fix htons conversion warnings
Commit
d0f91938bede ("tipc: add ip/udp media type") introduced
some new sparse warnings. Clean them up.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 6 Mar 2015 19:47:59 +0000 (11:47 -0800)]
net_sched: destroy proto tp when all filters are gone
Kernel automatically creates a tp for each
(kind, protocol, priority) tuple, which has handle 0,
when we add a new filter, but it still is left there
after we remove our own, unless we don't specify the
handle (literally means all the filters under
the tuple). For example this one is left:
# tc filter show dev eth0
filter parent 8001: protocol arp pref 49152 basic
The user-space is hard to clean up these for kernel
because filters like u32 are organized in a complex way.
So kernel is responsible to remove it after all filters
are gone. Each type of filter has its own way to
store the filters, so each type has to provide its
way to check if all filters are gone.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch A Williams [Thu, 5 Mar 2015 04:14:40 +0000 (04:14 +0000)]
i40e: add ethtool RSS support
Add support for setting the RSS hash table and hash key through ethtool.
This patch incorporates suggestions from Ben Hutchings
<ben.hutchings@codethink.co.uk>.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Catherine Sullivan [Fri, 27 Feb 2015 09:18:37 +0000 (09:18 +0000)]
i40e/i40evf: Bump i40e/i40evf version
Bump PF version to 1.2.37 and VF version to 1.2.25
Change-ID: I0287a750408250dc055c03e1f744fd5f0caefd68
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Fri, 27 Feb 2015 09:18:36 +0000 (09:18 +0000)]
i40e: add MAC printing to debugfs dump VSI
Print the LAN, SAN, and Port MACs for the VSI if debugfs command
dump VSI is used on the PF's VSI.
Example output:
[260221.871244] i40e 0000:04:00.0: MAC address: 68:05:ca:26:15:e0 SAN MAC: 00:00:00:00:02:00 Port MAC: 68:05:ca:26:15:e3
Change-ID: I0b393113dfb5ee7ff4f9e5227e4177885f0cc15e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pablo Neira Ayuso [Mon, 9 Mar 2015 11:30:12 +0000 (12:30 +0100)]
netfilter: bridge: move DNAT helper to br_netfilter
Only one caller, there is no need to keep this in a header.
Move it to br_netfilter.c where this belongs to.
Based on patch from Florian Westphal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Steven Rostedt (Red Hat) [Sat, 7 Mar 2015 00:55:13 +0000 (19:55 -0500)]
ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled
Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.
That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().
Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.
The problem could be seen by the following commands:
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace
The trace will show that function tracing was not active.
Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pratyush Anand [Fri, 6 Mar 2015 18:28:06 +0000 (23:58 +0530)]
ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl
When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().
Consider the following situation.
# echo 0 > /proc/sys/kernel/ftrace_enabled
After this ftrace_enabled = 0.
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.
# echo 1 > /proc/sys/kernel/ftrace_enabled
Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.
Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer
Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.
On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.
Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.
Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com
Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Pratyush Anand <panand@redhat.com>
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt (Red Hat) [Thu, 5 Mar 2015 04:10:28 +0000 (23:10 -0500)]
ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl
When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.
ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).
When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.
For example, the following will cause ftrace to crash:
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.
Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
James Morris [Mon, 9 Mar 2015 13:38:16 +0000 (00:38 +1100)]
Merge branch 'for-current' of https://github.com/PeterHuewe/linux-tpmdd into for-linus
Florian Westphal [Wed, 4 Mar 2015 23:52:36 +0000 (00:52 +0100)]
netfilter: bridge: refactor conditional in br_nf_dev_queue_xmit
simpilifies followup patch that re-works brnf ip_fragment handling.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Wed, 4 Mar 2015 23:52:34 +0000 (00:52 +0100)]
netfilter: bridge: move nf_bridge_update_protocol to where its used
no need to keep it in a header file.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Wed, 4 Mar 2015 23:52:33 +0000 (00:52 +0100)]
bridge: move mac header copying into br_netfilter
The mac header only has to be copied back into the skb for
fragments generated by ip_fragment(), which only happens
for bridge forwarded packets with nf-call-iptables=1 && active nf_defrag.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeff Kirsher [Fri, 27 Feb 2015 09:18:34 +0000 (09:18 +0000)]
i40e: Fix inconsistent use of PF/VF vs pf/vf
Joe Perches pointed out that we were inconsistent in the use of
PF vs pf or VF vs vf in our driver code. Since acronyms are usually
capitalized to denote that it is an acronym, changed all references to
be consistent throughout the code.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Fri, 27 Feb 2015 09:18:33 +0000 (09:18 +0000)]
i40e: tame the nvmupdate read and write complaints
The NVMUpdate tool doesn't necessarily know the ReadOnly map of the current
NVM image, and must try reading and writing words that may be protected.
This generates an error out of the Firmware request that the driver logs.
Unfortunately, this ends up spitting out hundreds of bogus read and write
error message that looks rather messy.
This patch checks the error type and under normal conditions will not print
the typical read and write errors during NVMUpdate. This can be overridden
by enabling the NVM update debugging. This results in a much less messy log
file, and likely many fewer customer support questions.
Change-ID: Id4ff2e9048c523b0ff503aa5ab181b025ec948ea
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>