Ian Morris [Fri, 14 Aug 2015 21:43:38 +0000 (22:43 +0100)]
ipv6: trivial whitespace fix
Change brace placement to be in line with coding standards
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Fri, 14 Aug 2015 22:37:15 +0000 (00:37 +0200)]
rhashtable-test: extend to test concurrency
After having tested insertion, lookup, table walk and removal, spawn a
number of threads running operations on the same rhashtable. Each of
them will:
1) insert it's own set of objects,
2) lookup every successfully inserted object and finally
3) remove objects in several rounds until all of them have been removed,
making sure the remaining ones are still found after each round.
This should put a good amount of load onto the system and due to
synchronising thread startup via two semaphores also extensive
concurrent table access.
The default number of ten threads returned within half a second on my
local VM with two cores. Running 200 threads took about four seconds. If
slow systems suffer too much from this though, the default could be
lowered or even set to zero so this extended test does not run at all by
default.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Aug 2015 21:31:42 +0000 (14:31 -0700)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Aug 2015 21:25:04 +0000 (14:25 -0700)]
Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another pull request for the next cycle, this time with quite
a bit of content:
* mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
* TDLS higher bandwidth support (Arik)
* OCB fixes from Bertold Van den Bergh
* suspend/resume fixes from Eliad
* dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
* VHT bitrate mask support (Lorenzo Bianconi)
* better regulatory support for 5/10 MHz channels (Matthias May)
* basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Aug 2015 21:22:48 +0000 (14:22 -0700)]
Merge branch 'bpf_fanout'
Willem de Bruijn says:
====================
packet: add cBPF and eBPF fanout modes
Allow programmable fanout modes. Support both classical BPF programs
passed directly and extended BPF programs passed by file descriptor.
One use case is packet steering by deep packet inspection, for
instance for packet steering by application layer header fields.
Separate the configuration of the fanout mode and the configuration
of the program, to allow dynamic updates to the latter at runtime.
Changes
v1 -> v2:
- follow SO_LOCK_FILTER semantics on filter updates
- only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER
- rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match
man 2 bpf usage: "classic" vs. "extended" BPF.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Sat, 15 Aug 2015 02:31:37 +0000 (22:31 -0400)]
selftests/net: test extended BPF fanout mode
Test PACKET_FANOUT_EBPF by inserting a program into the the kernel
with bpf(), then attaching it to the fanout group. Observe the same
payload-based distribution as in the PACKET_FANOUT_CBPF test.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Sat, 15 Aug 2015 02:31:36 +0000 (22:31 -0400)]
selftests/net: test classic bpf fanout mode
Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a
socket by payload. Requires modifying the test program to send
packets with multiple payloads.
Also fix a bug in testing the return value of mmap()
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Sat, 15 Aug 2015 02:31:35 +0000 (22:31 -0400)]
packet: add extended BPF fanout mode
Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.
Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Sat, 15 Aug 2015 02:31:34 +0000 (22:31 -0400)]
packet: add classic BPF fanout mode
Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program
to select a socket.
This avoids having to keep adding special case fanout modes. One
example use case is application layer load balancing. The QUIC
protocol, for instance, encodes a connection ID in UDP payload.
Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data
associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the
only user so far.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 14 Aug 2015 14:40:40 +0000 (16:40 +0200)]
lwtunnel: rename ip lwtunnel attributes
We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.
As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.
Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Mon, 17 Aug 2015 20:45:36 +0000 (13:45 -0700)]
smsc911x: Fix crash seen if neither ACPI nor OF is configured or used
Commit
0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.
qemu: hardware error: lan9118_read: Bad reg 0x86
Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Aug 2015 21:05:14 +0000 (14:05 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-08-17
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
From Thomas Egerer.
2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
From Andrzej Hajda.
3) Pass oif to the xfrm lookups so that it gets set on the flow
and the resolver routines can match based on oif.
From David Ahern.
4) Add documentation for the new xfrm garbage collector threshold.
From Alexander Duyck.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 14 Aug 2015 01:34:03 +0000 (18:34 -0700)]
net: fix endian check warning in etherdevice.h
Sparse builds have been warning for a really long time now
that etherdevice.h has a conversion that is unsafe.
include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer
This code change fixes the issue and generates the exact
same assembly before/after (checked on x86_64)
Fixes: 2c722fe1c821 (etherdevice: Optimize a few is_<foo>_ether_addr functions)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Aug 2015 18:50:25 +0000 (11:50 -0700)]
Merge branch 'iff_no_queue'
Phil Sutter says:
====================
net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
This series adds a new private net_device flag indicating that a device may
(and probably should) be used without a queueing discipline attached to it.
This is already common practice for many virtual device types like e.g.
loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these
devices lack an underlying layer which could impose back pressure and therefore
making a TX queue necessary to not slow down senders.
Up to now, drivers being aware of the above applying to them set
dev->tx_queue_len to zero to indicate no qdisc should be attached to the
interface they drive and the kernel reacts upon this by assigning the noop
qdisc instead of the default pfifo_fast. This implicit agreement though leads
to an inconvenient situation once a user tries to attach a real qdisc to these
devices, as the formerly special tx_queue_len value becomes a regular one,
limiting the queue to zero packets and thus prevents any TX from happening. To
overcome this, practically all qdisc implementations intercept and sanitize the
malicious value.
With this series applied, drivers may signal the lack of need for a qdisc
without having to tamper with tx_queue_len, making fallbacks in qdiscs and
caveats in userspace unnecessary.
Upon upstream acceptance, this series will be followed up by a set of patches
converting device drivers, adding a warning so out-of-tree driver authors get
aware of this change and dropping all special handling of tx_queue_len in
net/sched/.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 13 Aug 2015 17:01:07 +0000 (19:01 +0200)]
net: sch_generic: react upon IFF_NO_QUEUE flag
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 13 Aug 2015 17:01:06 +0000 (19:01 +0200)]
net: declare new net_device priv_flag IFF_NO_QUEUE
This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Mon, 17 Aug 2015 12:15:10 +0000 (14:15 +0200)]
tipc: don't sanity check non-existing TLV (NL compat)
A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Mon, 17 Aug 2015 05:28:25 +0000 (08:28 +0300)]
bnx2: Fix bandwidth allocation for some MF modes
Management firmware tells driver in case bandwidth configuration for
a specific function exists, but [regretably] the same field has different
meanings depending on the multi-function mode - it can either be
a percentile value or an actual speed.
For newer multi-function modes current logic is incorrect -
driver understands values as actual speeds instead of percentages,
causing the resulting chip configuration to be incorrect.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 15 Aug 2015 17:54:07 +0000 (10:54 -0700)]
ipv4: fix refcount leak in fib_check_nh()
fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.
This patch solves the typical message at reboot time or device
dismantle :
unregister_netdevice: waiting for eth0 to become free. Usage count = 4
Fixes: 3bfd847203c6 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Tue, 30 Jun 2015 21:45:26 +0000 (23:45 +0200)]
batman-adv: Fix potentially broken skb network header access
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.
Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.
Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3da7 ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Simon Wunderlich [Wed, 24 Jun 2015 12:50:20 +0000 (14:50 +0200)]
batman-adv: remove broadcast packets scheduled for purged outgoing if
When an interface is purged, the broadcast packets scheduled for this
interface should get purged as well.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Marek Lindner [Sun, 21 Jun 2015 16:36:28 +0000 (00:36 +0800)]
batman-adv: protect tt request from double deletion
The list_del() calls were changed to list_del_init() to prevent
an accidental double deletion in batadv_tt_req_node_new().
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Tue, 16 Jun 2015 15:10:26 +0000 (17:10 +0200)]
batman-adv: Fix potential synchronization issues in mcast tvlv handler
So far the mcast tvlv handler did not anticipate the processing of
multiple incoming OGMs from the same originator at the same time. This
can lead to various issues:
* Broken refcounting: For instance two mcast handlers might both assume
that an originator just got multicast capabilities and will together
wrongly decrease mcast.num_disabled by two, potentially leading to
an integer underflow.
* Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
one after another try to do an
hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
cause memory corruption / crashes.
(Reported by: Sven Eckelmann <sven@narfation.org>)
Right in the beginning the code path makes assumptions about the current
multicast related state of an originator and bases all updates on that. The
easiest and least error prune way to fix the issues in this case is to
serialize multiple mcast handler invocations with a spinlock.
Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Tue, 16 Jun 2015 15:10:25 +0000 (17:10 +0200)]
batman-adv: Make MCAST capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Tue, 16 Jun 2015 15:10:24 +0000 (17:10 +0200)]
batman-adv: Make TT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: e17931d1a61d ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Tue, 16 Jun 2015 15:10:23 +0000 (17:10 +0200)]
batman-adv: Make NC capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 3f4841ffb336 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Tue, 16 Jun 2015 15:10:22 +0000 (17:10 +0200)]
batman-adv: Make DAT capability changes atomic
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 17cf0ea455f1 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Emmanuel Grumbach [Sun, 19 Jul 2015 13:09:12 +0000 (16:09 +0300)]
mac80211: fix BIT position for TDLS WIDE extended cap
The bit was not according to ieee80211 specification.
Fix that.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 13 Jul 2015 10:26:46 +0000 (12:26 +0200)]
mac80211: use DECLARE_EWMA
Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.
This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 13 Jul 2015 10:17:25 +0000 (12:17 +0200)]
average: provide macro to create static EWMA
Having the EWMA parameters stored in the runtime struct imposes
memory requirements for the constant values that could just be
inlined in the code. This particularly makes sense if there are
a lot of such structs, for example in mac80211 in the station
table where each station has a number of these in an array, and
there can be many stations.
Provide a macro DECLARE_EWMA() that declares the necessary struct
and inline functions to access it with the parameters hard-coded;
using this also means the user no longer needs to 'select AVERAGE'
as it's entirely self-contained.
In the mac80211 case, on x86-64, this actually slightly *reduces*
code size, while also saving 80 bytes of runtime memory per sta.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Su Kang Yin [Fri, 7 Aug 2015 08:54:10 +0000 (16:54 +0800)]
mac80211_hwsim: unregister genetlink family properly
During hwsim_init_netlink(), we should call genl_unregister_family()
if failed on netlink_register_notifier() since the genetlink is
already registered.
Signed-off-by: Su Kang Yin <cantona@cantona.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lorenzo Bianconi [Thu, 6 Aug 2015 21:47:33 +0000 (23:47 +0200)]
mac80211: add rate mask logic for vht rates
Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask()
method in order to apply mcs mask for vht rates
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lorenzo Bianconi [Thu, 6 Aug 2015 21:47:32 +0000 (23:47 +0200)]
mac80211: define rate_control_apply_mask_ratetbl()
Define rate_control_apply_mask_ratetbl() in order to apply ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lorenzo Bianconi [Thu, 6 Aug 2015 21:47:31 +0000 (23:47 +0200)]
mac80211: remove ieee80211_tx_rate dependency in rate mask code
Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(),
rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the
previous logic to define a ratemask in rate_control_set_rates() for
station rate table. Moreover move rate mask definition logic in
rate_control_cap_mask()
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lorenzo Bianconi [Thu, 6 Aug 2015 21:47:30 +0000 (23:47 +0200)]
mac80211: remove ieee80211_tx_info from rate_control_apply_mask signature
Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask
signature. rate_control_apply_mask() will be used to define a ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bertold Van den Bergh [Wed, 5 Aug 2015 14:02:50 +0000 (16:02 +0200)]
mac80211: Make OCB mode set BSSID
Perform the BSS_CHANGED_BSSID action when joining an OCB network.
This is required to set the broadcast BSSID in some network drivers.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bertold Van den Bergh [Wed, 5 Aug 2015 14:02:42 +0000 (16:02 +0200)]
mac80211: Only accept data frames in OCB mode
Currently OCB mode accepts frames with bssid==broadcast and type!=beacon.
Some non-data frames are sent matching this, for example probe responses.
This results in unnecessary creation of STA entries.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bertold Van den Bergh [Wed, 5 Aug 2015 14:02:28 +0000 (16:02 +0200)]
mac80211: Set txrc.bss to true for OCB interfaces
To make mac80211 accept the multicast rate requested by the user the
rate control should be told that it is operating in BSS mode.
Without this, the default rate is selected in rate_control_send_low
(!pubsta and !txrc->bss)
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bertold Van den Bergh [Wed, 5 Aug 2015 14:02:21 +0000 (16:02 +0200)]
nl80211: Allow setting multicast rate on OCB interfaces
Allow setting multicast rate on OCB interfaces.
Current behaviour results in EOPNOTSUPP when attempting this.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Michal Kazior [Mon, 3 Aug 2015 08:55:24 +0000 (10:55 +0200)]
cfg80211: propagate set_wiphy failure to userspace
If driver failed to setup wiphy params (e.g. rts
threshold, fragmentation treshold) userspace
wasn't properly notified about this. This could
lead to user confusion who would think the command
succeeded even if that wasn't the case.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Matthias May [Fri, 17 Jul 2015 13:28:39 +0000 (15:28 +0200)]
cfg80211: regulatory: handle 5 and 10 MHz channels properly
The original assumption of 20MHz wide channels hasn't been true since
the addition of support for 5 and 10 MHz channels.
Change the code to no longer disable all channels that don't fit into
the 20MHz grid, but instead set the appropriate flags to disable
operation on specific bandwidths.
Signed-off-by: Matthias May <matthias.may@neratec.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
David S. Miller [Fri, 14 Aug 2015 05:43:22 +0000 (22:43 -0700)]
Merge branch 'vrf-lite'
David Ahern says:
====================
VRF-lite - v6
In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.
This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.
This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.
The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)
High Level points
=================
1. Simple overlay driver (minimal changes to current stack)
* uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
* Applications can use SO_BINDTODEVICE or cmsg device indentifiers
to pick VRF (ping, traceroute just work)
* Standard IP Rules work, and since they are aggregated against the
device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
the routing plane (and ARP)
N2
N1 (all configs here) +---------------+
+--------------+ | |
|swp1 :10.0.1.1+----------------------+swp1 :10.0.1.2 |
| | | |
|swp2 :10.0.2.1+----------------------+swp2 :10.0.2.2 |
| | +---------------+
| VRF 1 |
| table 5 |
| |
+---------------+
| |
| VRF 2 | N3
| table 6 | +---------------+
| | | |
|swp3 :10.0.2.1+----------------------+swp1 :10.0.2.2 |
| | | |
|swp4 :10.0.3.1+----------------------+swp2 :10.0.3.2 |
+--------------+ +---------------+
Given the topology above, the setup needed to get the basic VRF
functions working would be
Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6
Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6
ip link set vrf1 up
ip link set vrf2 up
Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2
Connected and local routes are automatically moved from main and local
tables to the VRF table.
ping using VRF0 is simply
ping -I vrf0 10.0.1.2
Design Highlights
=================
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
The master device index is used as the iif for all lookups.
2. Tx path
Similarly, for Tx the VRF device oif is used in the flow to direct
lookups to the table associated with the VRF via its rule. From there
the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
not be used for FIB table lookups.
3. Connected and local routes
On link up for a device, connected and local routes are added to the
table associated with the VRF device, rather than the local and main
tables.
4. Socket lookups
Sockets operating in the VRF must be bound to the VRF device. As such
socket lookups compare the VRF device index to sk_bound_dev_if.
5. Neighbor entries
Neighbor entries are not impacted by the VRF device. Entries are
associated with a particular interface; the VRF association is indirect
via the interface-to-VRF device enslavement.
Version 6
- addressed comments from DaveM
- added patch to properly set oif in ip_send_unicast_reply. Needs to be
set to VRF device for proper FIB lookup
- added patch to handle IP fragments
Version 5
- dropped patch regarding socket lookups; no longer needed
+ removed vrf helpers no longer needed after this patch is dropped
- removed dev_open and close operations
+ no need to reset vrf data on an ifdown and creates problems if a
slave is deleted while the vrf interface is down (Thanks, Nikolay)
- cleanups for sparse warnings
+ make C=2 is now clean for vrf driver
Version 4
- builds are clean with and without VRF device enabled (no, yes and module)
- tightened the driver implementation
+ device add/delete, slave add/remove, and module unload are all clean
- fixed RCU references
+ with RCU and lock debugging enabled changes are clean through the
suite of tests
- TX path uses custom dst, so patch refactoring rtable allocation is
dropped along with the patch adding rt_nexthop helper
- dropped the task patch that adds default bind to interface for sockets
and the associated chvrf example command
+ the patches are a convenience for running unmodified code. They
are not needed for the core functionality. Any application with
support for SO_BINDTODEVICE works properly with this patch set.
Version 3
- addressed comments from first 2 RFCs with the exception of the name
Nicolas: We will do the name conversion once we agree on what the
correct name should be (vrf, mrf or something else)
- packets flow through the VRF device in both directions allowing the
following:
- tcpdump -i vrf<n>
- tc rules on vrf device
- netfilter rules on vrf device
TO-DO
=====
1. IPv6
2. ipsec, xfrms
- dst patch accepted into ipsec-next; will post VRF patch once merge happens
3. listen filter to allow 1 socket to work with multiple VRF devices
- i.e., bind to VRF's a, b, c only or NOT VRFs e, f, g
Eric B:
I have ipsec working with VRFs implemented using the VRF driver,
including the worst case scenario of complete duplication in the
networking config.
Thanks to Nikolay for his many, many code reviews whipping the device
driver into shape, and bug-Fixes and ideas from Hannes, Roopa Prabhu,
Jon Toppins, Jamal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:10 +0000 (14:59 -0600)]
net: Introduce VRF device driver
This driver borrows heavily from IPvlan and teaming drivers.
Routing domains (VRF-lite) are created by instantiating a VRF master
device with an associated table and enslaving all routed interfaces that
participate in the domain. As part of the enslavement, all connected
routes for the enslaved devices are moved to the table associated with
the VRF device. Outgoing sockets must bind to the VRF device to function.
Standard FIB rules bind the VRF device to tables and regular fib rule
processing is followed. Routed traffic through the box, is forwarded by
using the VRF device as the IIF and following the IIF rule to a table
that is mated with the VRF.
Example:
Create vrf 1:
ip link add vrf1 type vrf table 5
ip rule add iif vrf1 table 5
ip rule add oif vrf1 table 5
ip route add table 5 prohibit default
ip link set vrf1 up
Add interface to vrf 1:
ip link set eth1 master vrf1
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:09 +0000 (14:59 -0600)]
net: frags: Add VRF device index to cache and lookup
Fragmentation cache uses information from the IP header to reassemble
packets. That information can be duplicated across VRFs -- same source
and destination addresses, protocol and id. Handle fragmentation with
VRFs by adding the VRF device index to entries in the cache and the
lookup arg.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:08 +0000 (14:59 -0600)]
net: Use VRF index for oif in ip_send_unicast_reply
If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:07 +0000 (14:59 -0600)]
net: Use passed in table for nexthop lookups
If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,
$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1003
192.168.56.0/24 dev eth1 proto kernel scope link src 192.168.56.51
$ ip route ls table 10
1.1.1.0/24 dev eth2 scope link
Without this patch adding a nexthop route fails:
$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable
With this patch the route is added successfully.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:06 +0000 (14:59 -0600)]
net: Add routes to the table associated with the device
When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.
A part of this is directing prefsrc validations to the correct
table as well.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:05 +0000 (14:59 -0600)]
net: Fix up inet_addr_type checks
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:04 +0000 (14:59 -0600)]
net: Add inet_addr lookup by table
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:03 +0000 (14:59 -0600)]
udp: Handle VRF device in sendmsg
For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:02 +0000 (14:59 -0600)]
net: Use VRF device index for lookups on TX
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:01 +0000 (14:59 -0600)]
net: Use VRF device index for lookups on RX
On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Aug 2015 20:59:00 +0000 (14:59 -0600)]
net: Introduce VRF related flags and helpers
Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.
Add link attribute for passing VRF_TABLE id.
Add vrf_ptr to netdevice.
Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Thu, 13 Aug 2015 19:26:35 +0000 (15:26 -0400)]
net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo
This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha Levin [Thu, 13 Aug 2015 18:03:16 +0000 (14:03 -0400)]
net: allow sleeping when modifying store_rps_map
Commit
10e4ea751 ("net: Fix race condition in store_rps_map") has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.
Make rps_map_lock a mutex to allow sleeping under it.
Fixes: 10e4ea751 ("net: Fix race condition in store_rps_map")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Aug 2015 04:31:14 +0000 (21:31 -0700)]
Merge branch 'mv88e6xxx-hw-vlan'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: add hardware VLAN support
This patchset brings support to access hardware VLAN entries in DSA and
mv88e6xxx, through switchdev VLAN objects.
In the following example, ports swp[0-2] belong to bridge br0, and ports
swp[3-4] belong to bridge br1. Here's an example of what can be achieved
after this patchset:
# bridge vlan add dev swp1 vid 100 master
# bridge vlan add dev swp2 vid 100 master
# bridge vlan add dev swp3 vid 100 master
# bridge vlan add dev swp4 vid 100 master
# bridge vlan del dev swp1 vid 100 master
The above commands correctly programmed hardware VLAN 100 for port swp2,
while ports swp3 and swp4 use software VLAN 100, as shown with:
# bridge vlan
port vlan ids
swp0 None
swp0
swp1 None
swp1
swp2 100
swp2 100
swp3 100
swp3
swp4 100
swp4
br0 None
br1 None
Assuming that port 5 is the CPU port, the hardware VLAN table would
contain the following data:
VID FID SID 0 1 2 3 4 5 6
100 8 0 x x t x x t x
Where 'x' means excluded, and 't' means tagged.
Also, adding an FDB entry to VLAN 100 for port swp2 like this:
# bridge fdb add 3c:97:0e:11:6e:30 dev swp2 vlan 100
Would result in the following example output:
# bridge fdb
# 01:00:5e:00:00:01 dev eth0 self permanent
# 01:00:5e:00:00:01 dev eth1 self permanent
# 00:50:d2:10:78:15 dev swp0 master br0 permanent
# 00:50:d2:10:78:15 dev swp2 vlan 100 master br0 permanent
# 3c:97:0e:11:6e:30 dev swp2 vlan 100 self static
# 00:50:d2:10:78:15 dev swp3 master br1 permanent
# 00:50:d2:10:78:15 dev swp3 vlan 100 master br1 permanent
And the Address Translation Unit would contain:
DB T/P Vec State Addr
008 Port 004 e 3c:97:0e:11:6e:30
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:23 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: use port 802.1Q mode Secure
This commit changes the 802.1Q mode of each port from Disabled to
Secure. This enables the VLAN support, by checking the VTU entries on
ingress.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:22 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: add VLAN Load support
Implement port_pvid_set and port_vlan_add to add new entries in the VLAN
hardware table, and join ports to them.
The patch also implement the STU Get Next and Load Purge operations,
since it is required to have a valid STU entry for at least all VLANs.
Each VLAN has its own forwarding database, with FID num_ports+1 to 4095.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:21 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: add VLAN Purge support
Add support for the VTU Load Purge operation and implement the
port_vlan_del driver function to remove a port from a VLAN entry, and
delete the VLAN if the given port was its last member.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:20 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: add VLAN support to FDB dump
Add an helper function to read the next valid VLAN entry for a given
port. It is used in the VID to FID conversion function to retrieve the
forwarding database assigned to a given VLAN port.
Finally update the FDB getnext operation to iterate on the next valid
port VLAN when the end of the current database is reached.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:19 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: add VLAN Get Next support
Implement the port_pvid_get and vlan_getnext driver functions required
to dump VLAN entries from the hardware, with the VTU Get Next operation.
Some functions and structure will be shared with STU operations, since
their table format are similar (e.g. STU data entries are accessible
with the same registers as VTU entries, except with an offset of 2).
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:18 +0000 (12:52 -0400)]
net: dsa: mv88e6xxx: flush VTU and STU entries
Implement the VTU Flush operation (which also flushes the STU), so that
warm boots won't preserved old entries.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 13 Aug 2015 16:52:17 +0000 (12:52 -0400)]
net: dsa: add support for switchdev VLAN objects
Add new functions in DSA drivers to access hardware VLAN entries through
SWITCHDEV_OBJ_PORT_VLAN objects:
- port_pvid_get() and vlan_getnext() to dump a VLAN
- port_vlan_del() to exclude a port from a VLAN
- port_pvid_set() and port_vlan_add() to join a port to a VLAN
The DSA infrastructure will ensure that each VLAN of the given range
does not already belong to another bridge. If it does, it will fallback
to software VLAN and won't program the hardware.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Thu, 13 Aug 2015 14:39:01 +0000 (10:39 -0400)]
net: ipv6 sysctl option to ignore routes when nexthop link is down
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link. The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.
1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
8000::/8 dev p8p1 proto kernel metric 256 pref medium
9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
fe80::/64 dev p8p1 proto kernel metric 256 pref medium
This also adds devconf support and notification when sysctl values
change.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Thu, 13 Aug 2015 14:39:00 +0000 (10:39 -0400)]
net: track link status of ipv6 nexthops
Add support to track current link status of ipv6 nexthops to match
recent changes that added support for ipv4 nexthops. This takes a
simple approach to track linkdown status for next-hops and simply
checks the dev for the dst entry and sets proper flags that to be used
in the netlink message.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Thu, 13 Aug 2015 04:14:22 +0000 (09:44 +0530)]
cxgb4: Add MPS tracing support
Handle TRACE_PKT, stack can sniff them on the first port
Add debubfs enrty to configure tracing for offload traffic like iWARP
& iSCSI for debugging purpose.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
yalin wang [Thu, 13 Aug 2015 04:01:33 +0000 (12:01 +0800)]
net/fddi: remove HWM_REVERSE() macro
HWM_REVERSE() macro is unused, remove it.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Thu, 13 Aug 2015 01:45:25 +0000 (18:45 -0700)]
rocker: hook ndo_neigh_destroy to cleanup neigh refs in driver
Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops. The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down). This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device. In response, the driver will
purge the neigh entry from its internal tbl.
I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users. In any case, it
does what I need here. An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Thu, 13 Aug 2015 01:44:13 +0000 (18:44 -0700)]
rocker: print switch ID consistent with phys_switch_id sysfs node
On sucessful probe, driver prints the switch ID. This patch changes the
format of the printed ID to match what's used in sysfs phys_switch_id node.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Aug 2015 23:58:29 +0000 (16:58 -0700)]
Merge branch 'smsc911x-acpi'
Jeremy Linton says:
====================
Enable smsc911x for use with ACPI
This set of patches enables the front Ethernet port on the
ARM Juno development platform when used with an ACPI enabled kernel.
These patches covert the of_property* calls in the driver to the
DT/ACPI agnostic device_property* calls, and add the arm hardware
id to the acpi_match_table.
To support the above changes I copied a couple routines from
of_net into the properties.c file, and modified them to
be ACPI/DT agnostic. I'm not 100% sure this is the correct location
for these functions. But I think they are required to avoid having
a dozen different implementations scattered across assorted Ethernet
adapters that are being enabled to use ACPI properties.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Wed, 12 Aug 2015 22:06:27 +0000 (17:06 -0500)]
Convert smsc911x to use ACPI as well as DT
Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
to nonspecific device* calls, This allows the driver to work
with both ACPI and DT configurations. Ethernet should now work when using
ACPI on ARM Juno.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Wed, 12 Aug 2015 22:06:26 +0000 (17:06 -0500)]
Add a matching set of device_ functions for determining mac/phy
OF has some helper functions for parsing MAC and PHY settings.
In cases where the platform is providing this information rather
than the device itself, there needs to be similar functions for ACPI.
These functions are slightly modified versions of the ones in
of_net which can use information provided via DT or ACPI.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Aug 2015 23:52:20 +0000 (16:52 -0700)]
Merge branch 'tcp-loss-probe'
Yuchung Cheng says:
====================
minor tail loss probe improvements
This patch series enhance the tail loss probe (TLP) on some error
conditions. When TLP fails to send a probe, it will no longer
extend the RTO. When it fails to send a new packet because of
receiver window limit, it'll try to retransmit the last packet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Wed, 12 Aug 2015 18:18:19 +0000 (11:18 -0700)]
tcp: TLP retransmits last if failed to send new packet
When TLP fails to send new packet because of receive window
limit, it should fall back to retransmit the last packet instead.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Wed, 12 Aug 2015 18:18:18 +0000 (11:18 -0700)]
tcp: don't extend RTO on failed loss probe attempts
If TLP was unable to send a probe, it extended the RTO to
now + icsk_rto. But extending the RTO makes little sense
if no TLP probe went out. With this commit, instead of
extending the RTO we re-arm it relative to the transmit time
of the write queue head.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Aug 2015 23:51:00 +0000 (16:51 -0700)]
Merge branch 'cpsw-errata-workaround'
Mugunthan V N says:
====================
Add AM335x PG1.0 CPSW errata workaround
With commit
870915feabdc ("drivers: net: cpsw: remove
disable_irq/enable_irq as irq can be masked from cpsw itself"),
CPSW on AM335x beagle bone white is broken as there is a errata
for AM335x PG1.0. This patch series implements the workaround by
disabling the interrupts from ARM IRQ controller for AM335x SoC
in addition to the masking of interrupts in CPSW.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Wed, 12 Aug 2015 09:52:55 +0000 (15:22 +0530)]
ARM: dts: am33xx: update cpsw compatible
CPSW driver has been updated with compatibles for enabling errata
workarounds. So updating cpsw compatibles.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Wed, 12 Aug 2015 09:52:54 +0000 (15:22 +0530)]
ARM: dts: dra7: update cpsw compatible
CPSW driver has been updated with compatibles for enabling errata
workarounds. So updating cpsw compatibles.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Wed, 12 Aug 2015 09:52:53 +0000 (15:22 +0530)]
drivers: net: cpsw: add am335x errata workarround for interrutps
As per Am335x Errata [1] Advisory 1.0.9, The CPSW C0_TX_PEND and
C0_RX_PEND interrupt outputs provide a single transmit interrupt
that combines transmit channel interrupts TXPEND[7:0] and a
single receive interrupt that combines receive channel interrupts
RXPEND[7:0]. The TXPEND[0] and RXPEND[0] interrupt outputs are
connected to the ARM Cortex-A8 interrupt controller (INTC) rather
than the C0_TX_PEND and C0_RX_PEND interrupt outputs. So even
though CPSW interrupt is cleared by writing appropriate values to
EOI register the interrupt is not cleared in IRQ controller. So
interrupt is still pending and CPU is struck in ISR, the
workaround is to disable the interrupts in ARM irq controller.
[1] http://www.ti.com/lit/er/sprz360f/sprz360f.pdf
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Aug 2015 23:23:11 +0000 (16:23 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/cavium/Kconfig
The cavium conflict was overlapping dependency
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 13 Aug 2015 20:52:46 +0000 (13:52 -0700)]
Merge tag 'dm-4.2-fixes-5' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- two stable fixes for corruption seen in a snapshot of thinp metadata;
metadata snapshots aren't widely used but help provide a consistent
view of the metadata associated with an active thin-pool.
- a dm-cache fix for the 4.2 "default" policy switch from "mq" to "smq"
* tag 'dm-4.2-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: move 'dm-cache-default' module alias to SMQ
dm btree: add ref counting ops for the leaves of top level btrees
dm thin metadata: delete btrees when releasing metadata snapshot
Linus Torvalds [Thu, 13 Aug 2015 20:44:32 +0000 (13:44 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull xen block driver fixes from Jens Axboe:
"A few small bug fixes for xen-blk{front,back} that have been sitting
over my vacation"
* 'for-linus' of git://git.kernel.dk/linux-block:
xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()
xen-blkfront: don't add indirect pages to list when !feature_persistent
xen-blkfront: introduce blkfront_gather_backend_features()
Linus Torvalds [Thu, 13 Aug 2015 20:36:22 +0000 (13:36 -0700)]
Merge tag 'for-linus-4.2-rc6-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- revert a fix from 4.2-rc5 that was causing lots of WARNING spam.
- fix a memory leak affecting backends in HVM guests.
- fix PV domU hang with certain configurations.
* tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
Linus Torvalds [Thu, 13 Aug 2015 15:25:20 +0000 (08:25 -0700)]
Revert x86 sigcontext cleanups
This reverts commits
9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and
c6f2062935c8 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").
They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).
Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 Aug 2015 17:46:39 +0000 (10:46 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi
devices, from Emmanuel Grumbach.
2) Falling back to vmalloc in conntrack should not emit a warning, from
Pablo Neira Ayuso.
3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis
Felipe Dominguez Vega.
4) Rocker doesn't free netdev on device removal, from Ido Schimmel.
5) UDP multicast early sock demux has route handling races, from Eric
Dumazet.
6) Fix L4 checksum handling in openvswitch, from Glenn Griffin.
7) Fix use-after-free in skb_set_peeked, from Herbert Xu.
8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can lead
to fraglists longer than the driver can support. From Jason Wang.
9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto.
10) Fix interrupt storm in bna driver, from Ivan Vecera.
11) Don't propagate -EBUSY from netlink_insert(), from Daniel Borkmann.
12) Fix inet request sock leak, from Eric Dumazet.
13) Fix TX interrupt masking and marking in TX descriptors of fs_enet
driver, from LEROY Christophe.
14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely
to get fixed any time soon. From Jakub Kicinski
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
cosa: missing error code on failure in probe()
gianfar: remove faulty filer optimizer
gianfar: correct list membership accounting
gianfar: correct filer table writing
bonding: Gratuitous ARP gets dropped when first slave added
net: dsa: Do not override PHY interface if already configured
net: fs_enet: mask interrupts for TX partial frames.
net: fs_enet: explicitly remove I flag on TX partial frames
inet: fix possible request socket leak
inet: fix races with reqsk timers
mkiss: Fix error handling in mkiss_open()
bnx2x: Free NVRAM lock at end of each page
bnx2x: Prevent null pointer dereference on SKB release
cxgb4: missing curly braces in t4_setup_debugfs()
net-timestamp: Update skb_complete_tx_timestamp comment
ipv6: don't reject link-local nexthop on other interface
netlink: make sure -EBUSY won't escape from netlink_insert
bna: fix interrupts storm caused by erroneous packets
net: mvpp2: replace TX coalescing interrupts with hrtimer
net: mvpp2: enable proper per-CPU TX buffers unmapping
...
Linus Torvalds [Thu, 13 Aug 2015 17:22:11 +0000 (10:22 -0700)]
Merge tag 'edac_fix_for_4.2' of git://git./linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"A ppc4xx_edac fix for accessing ->csrows properly. This driver was
missed during the conversion a couple of years ago"
* tag 'edac_fix_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, ppc4xx: Access mci->csrows array elements properly
Dan Carpenter [Mon, 27 Jul 2015 08:11:11 +0000 (11:11 +0300)]
mac80211: remove always true condition
The outside if statement checks that IEEE80211_TX_INTFL_MLME_CONN_TX is
set so this condition is always true. Checking twice upsets the static
checkers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 8 Jul 2015 12:41:48 +0000 (15:41 +0300)]
mac80211: remove ieee80211_aes_cmac_calculate_k1_k2()
The iwlwifi driver was the only driver that used this, but as
it turns out it never needed it, so we can remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 19 Jun 2015 08:20:10 +0000 (10:20 +0200)]
iwlwifi: mvm: don't set K1/K2 for AES-CMAC
According to firmware engineers, the firmware has never required
these fields and the values have always been calculated, they were
just leftovers from a previous implementation.
Therefore remove the unnecessary calculation.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Geert Uytterhoeven [Sun, 2 Aug 2015 09:09:54 +0000 (11:09 +0200)]
rfkill: Allow compile test of GPIO consumers if !GPIOLIB
The GPIO subsystem provides dummy GPIO consumer functions if GPIOLIB is
not enabled. Hence drivers that depend on GPIOLIB, but use GPIO consumer
functionality only, can still be compiled if GPIOLIB is not enabled.
Relax the dependency on GPIOLIB if COMPILE_TEST is enabled, where
appropriate.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Michael Walle [Tue, 21 Jul 2015 09:00:53 +0000 (11:00 +0200)]
EDAC, ppc4xx: Access mci->csrows array elements properly
The commit
de3910eb79ac ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")
changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
Dan Carpenter [Wed, 12 Aug 2015 21:08:01 +0000 (00:08 +0300)]
cosa: missing error code on failure in probe()
If register_hdlc_device() fails, the current code returns 0 but we
should return an error code instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Wed, 12 Aug 2015 17:23:14 +0000 (10:23 -0700)]
documentation: bring vxlan documentation more up-to-date
A few things have changed since the previous version of the vxlan
documentation was written, so update it and correct some grammar and
such while we are at it.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Wed, 12 Aug 2015 15:10:23 +0000 (12:10 -0300)]
net: fec: Remove unneeded use of IS_ERR_VALUE() macro
There is no need to use the IS_ERR_VALUE() macro for checking
the return value from pm_runtime_* functions.
Just do a simple negative test instead.
The semantic patch that makes this change is available
in scripts/coccinelle/api/pm_runtime.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei-Chun Chao [Wed, 12 Aug 2015 14:57:12 +0000 (07:57 -0700)]
bpf: fix bpf_perf_event_read() loop upper bound
Verifier rejects programs incorrectly.
Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read()")
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Aug 2015 23:42:12 +0000 (16:42 -0700)]
Merge branch 'cxgb4-more-debug-info'
Hariprasad Shenai says:
====================
Add some more debug info
This patch series adds the following.
Add more info for sge_qinfo dump
Differentiate tid and stids between different regions, and add a debugfs
entry to dump all the tid info
This patch series has been created against net-next tree and includes
patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 12 Aug 2015 11:25:07 +0000 (16:55 +0530)]
cxgb4: Add debugfs support to dump tid info
Add debugfs support to dump tid info like stid, sftid, tids, atid and
hwtids
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 12 Aug 2015 11:25:06 +0000 (16:55 +0530)]
cxgb4: Differentiate between stids between server and filter region
For T4 adapter, offloaded servers tid for IPv4 connections are
allocated from filter region. So add a new field for server filter tid if
server tid is allocated from filter region.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 12 Aug 2015 11:25:05 +0000 (16:55 +0530)]
cxgb4: Differentiates between TIDs being used in TCAM and HASH
For the tid info, differentiate from which region the TID is allocated
from. It can be from TCAM region or HASH region.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 12 Aug 2015 11:25:04 +0000 (16:55 +0530)]
cxgb4: Add some more details to sge qinfo
Adding more details to sge qinfo for debugging purpose.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>