openwrt/staging/blogic.git
8 years agonet_sched actions: use nla_parse_nested()
Johannes Berg [Wed, 26 Oct 2016 12:44:33 +0000 (14:44 +0200)]
net_sched actions: use nla_parse_nested()

Use nla_parse_nested instead of open-coding the call to
nla_parse() with the attribute data/len.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Fix error handling in alloc_uld_rxqs().
Ganesh Goudar [Wed, 26 Oct 2016 07:56:38 +0000 (13:26 +0530)]
cxgb4: Fix error handling in alloc_uld_rxqs().

Fix to release resources properly in error handling path of
alloc_uld_rxqs(), This patch also removes unwanted arguments
and avoids calling the same function twice.

Fixes: 94cdb8bb993a (cxgb4: Add support for dynamic allocation
       of resources for ULD
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoIB/mlx4: avoid a -Wmaybe-uninitialize warning
Arnd Bergmann [Tue, 25 Oct 2016 16:16:20 +0000 (18:16 +0200)]
IB/mlx4: avoid a -Wmaybe-uninitialize warning

There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:

ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()
Eli Cooper [Wed, 26 Oct 2016 02:11:09 +0000 (10:11 +0800)]
ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()

This patch updates skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit() when an
IPv6 header is installed to a socket buffer.

This is not a cosmetic change.  Without updating this value, GSO packets
transmitted through an ipip6 tunnel have the protocol of ETH_P_IP and
skb_mac_gso_segment() will attempt to call gso_segment() for IPv4,
which results in the packets being dropped.

Fixes: b8921ca83eed ("ip4ip6: Support for GSO/GRO")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: fix samples to add fake KBUILD_MODNAME
Daniel Borkmann [Tue, 25 Oct 2016 22:37:53 +0000 (00:37 +0200)]
bpf: fix samples to add fake KBUILD_MODNAME

Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).

Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.

Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.

Fixes: 65d472fb007d ("samples/bpf: add 'pointer to packet' tests")
Fixes: 6afb1e28b859 ("samples/bpf: Add tunnel set/get tests.")
Fixes: a3f74617340b ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoinet: Fix missing return value in inet6_hash
Craig Gallek [Tue, 25 Oct 2016 22:08:49 +0000 (18:08 -0400)]
inet: Fix missing return value in inet6_hash

As part of a series to implement faster SO_REUSEPORT lookups,
commit 086c653f5862 ("sock: struct proto hash function may error")
added return values to protocol hash functions and
commit 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function")
implemented a new hash function for IPv6.  However, the latter does
not respect the former's convention.

This properly propagates the hash errors in the IPv6 case.

Fixes: 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function")
Reported-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlx5-fixes'
David S. Miller [Sat, 29 Oct 2016 16:00:41 +0000 (12:00 -0400)]
Merge branch 'mlx5-fixes'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-10-25

This series contains some bug fixes for the mlx5 core and mlx5e driver.

From Daniel:
    - Cache line size determination at runtime, instead of using
      L1_CACHE_BYTES hard coded value, use cache_line_size()
    - Always Query HCA caps after setting them even on reset flow

From Mohamad:
    - Reorder netdev cleanup to uregister netdev before detaching it
      for the kernel to not complain about open resources such as vlans
    - Change the acl enable prototype to return status, for better error
      resiliency
    - Clear health sick bit when starting health poll after reset flow
    - Fix race between PCI error handlers and health work
    - PCI error recovery health care simulation, in case when the kernel
      PCI error handlers are not triggered for some internal firmware errors

From Noa:
    - Avoid passing dma address 0 to firmware when mapping system pages
      to the firmware

From Paul: Some straight forward flow steering fixes
    - Keep autogroups list ordered
    - Fix autogroups groups num not decreasing
    - Correctly initialize last use of flow counters

From Saeed:
    - Choose the nearest LRO timeout to the wanted one
      instead of blindly choosing "dev_cap.lro_timeout[2]"

This series has no conflict with the for-next pull request posted
earlier today ("Mellanox mlx5 core driver updates 2016-10-25").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Avoid passing dma address 0 to firmware
Noa Osherovich [Tue, 25 Oct 2016 15:36:35 +0000 (18:36 +0300)]
net/mlx5: Avoid passing dma address 0 to firmware

Currently the firmware can't work with a page with dma address 0.
Passing such an address to the firmware will cause the give_pages
command to fail.

To avoid this, in case we get a 0 dma address of a page from the
dma engine, we avoid passing it to FW by remapping to get an address
other than 0.

Fixes: bf0bf77f6519 ('mlx5: Support communicating arbitrary host...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: PCI error recovery health care simulation
Mohamad Haj Yahia [Tue, 25 Oct 2016 15:36:34 +0000 (18:36 +0300)]
net/mlx5: PCI error recovery health care simulation

In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.

The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.

We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.

Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Fix race between PCI error handlers and health work
Mohamad Haj Yahia [Tue, 25 Oct 2016 15:36:33 +0000 (18:36 +0300)]
net/mlx5: Fix race between PCI error handlers and health work

Currently there is a race between the health care work and the kernel
pci error handlers because both of them detect the error, the first one
to be called will do the error handling.
There is a chance that health care will disable the pci after resuming
pci slot.
Also create a separate WQ because now we will have two types of health
works, one for the error detection and one for the recovery.

Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Clear health sick bit when starting health poll
Mohamad Haj Yahia [Tue, 25 Oct 2016 15:36:32 +0000 (18:36 +0300)]
net/mlx5: Clear health sick bit when starting health poll

The health sick status should be cleared when we start the health poll.
This is crucial for driver reload (unload + load) in order to behave
right in case of health issue.

Fixes: fd76ee4da55a ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Change the acl enable prototype to return status
Mohamad Haj Yahia [Tue, 25 Oct 2016 15:36:31 +0000 (18:36 +0300)]
net/mlx5: Change the acl enable prototype to return status

The Ingress/Egress ACL enable function may fail and it should return
status to its caller to avoid NULL pointer dereference.

Fixes: f942380c1239 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Unregister netdev before detaching it
Mohamad Haj Yahia [Tue, 25 Oct 2016 15:36:30 +0000 (18:36 +0300)]
net/mlx5e: Unregister netdev before detaching it

Detaching the netdev before unregistering it cause some netdev cleanup
ndos to fail because they check presence of the netdev, so we need to
unregister the netdev first.

Fixes: 26e59d8077a3 ('net/mlx5e: Implement mlx5e interface attach/detach callbacks')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Choose best nearest LRO timeout
Saeed Mahameed [Tue, 25 Oct 2016 15:36:29 +0000 (18:36 +0300)]
net/mlx5e: Choose best nearest LRO timeout

Instead of predicting the index of the wanted LRO timeout value from
hardware capabilities, look for the nearest LRO timeout value.

Fixes: 5c50368f3831 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Correctly initialize last use of flow counters
Paul Blakey [Tue, 25 Oct 2016 15:36:28 +0000 (18:36 +0300)]
net/mlx5: Correctly initialize last use of flow counters

Currently, last use timestamp is initialized to zero.
This is not the expected value by higher layers such as
when we do TC action offloading. To fix that, set it to
the current time, e.g when the counter/rule is offloaded.
This is the same behaviour of non-offloaded TC actions.

Fixes: 43a335e055bb ('mlx5_core: Flow counters infrastructure')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Fix autogroups groups num not decreasing
Paul Blakey [Tue, 25 Oct 2016 15:36:27 +0000 (18:36 +0300)]
net/mlx5: Fix autogroups groups num not decreasing

Autogroups groups num is increased when creating a new flow group,
but is never decreased.

Now decreasing it when deleting a flow group.

Fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Keep autogroups list ordered
Paul Blakey [Tue, 25 Oct 2016 15:36:26 +0000 (18:36 +0300)]
net/mlx5: Keep autogroups list ordered

Finding a new autogroup range is done by going over a group list
sorted by each group start index. The search is stopped after finding
the first free range. Adding the newly created group to the list is
wrongly added to the end of the list regardless of its start index as
the parameter of where to insert it is ignored.

This commit makes sure to use that unused parameter to insert
it where requested.

Fixes: f0d22d187473 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Always Query HCA caps after setting them
Daniel Jurgens [Tue, 25 Oct 2016 15:36:25 +0000 (18:36 +0300)]
net/mlx5: Always Query HCA caps after setting them

Always query the HCA caps after setting them to update the capablities
data structures. Not doing so results in incorrect capabilities being
reported including max_dc, max_qp and several others.

Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware
and software flows")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years ago{net, ib}/mlx5: Make cache line size determination at runtime.
Daniel Jurgens [Tue, 25 Oct 2016 15:36:24 +0000 (18:36 +0300)]
{net, ib}/mlx5: Make cache line size determination at runtime.

ARM 64B cache line systems have L1_CACHE_BYTES set to 128.
cache_line_size() will return the correct size.

Fixes: cf50b5efa2fe('net/mlx5_core/ib: New device capabilities
handling.')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: validate chunk len before actually using it
Marcelo Ricardo Leitner [Tue, 25 Oct 2016 16:27:39 +0000 (14:27 -0200)]
sctp: validate chunk len before actually using it

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Fri, 28 Oct 2016 17:43:56 +0000 (13:43 -0400)]
Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: Couple of fixes

Couple of LPM tree management fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_router: Compare only trees which are in use during tree get
Jiri Pirko [Tue, 25 Oct 2016 09:25:57 +0000 (11:25 +0200)]
mlxsw: spectrum_router: Compare only trees which are in use during tree get

Only trees which are in use should be compared to requested prefix usage.

Fixes: 53342023eed9 ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_router: Save requested prefix bitlist when creating tree
Jiri Pirko [Tue, 25 Oct 2016 09:25:56 +0000 (11:25 +0200)]
mlxsw: spectrum_router: Save requested prefix bitlist when creating tree

Currently, the prefix bitlist is not saved for LPM trees, causing the
compare to always fail which causes the tree to be destroyed and created
for every inserted and removed FIB entry. So fix this by saving
the bitlist as it should have been done from the very beginning.

Fixes: 53342023eed9 ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet sched filters: fix notification of filter delete with proper handle
Jamal Hadi Salim [Tue, 25 Oct 2016 00:18:27 +0000 (20:18 -0400)]
net sched filters: fix notification of filter delete with proper handle

Daniel says:

While trying out [1][2], I noticed that tc monitor doesn't show the
correct handle on delete:

$ tc monitor
qdisc clsact ffff: dev eno1 parent ffff:fff1
filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80

some context to explain the above:
The user identity of any tc filter is represented by a 32-bit
identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
above. A user wishing to delete, get or even modify a specific filter
uses this handle to reference it.
Every classifier is free to provide its own semantics for the 32 bit handle.
Example: classifiers like u32 use schemes like 800:1:801 to describe
the semantics of their filters represented as hash table, bucket and
node ids etc.
Classifiers also have internal per-filter representation which is different
from this externally visible identity. Most classifiers set this
internal representation to be a pointer address (which allows fast retrieval
of said filters in their implementations). This internal representation
is referenced with the "fh" variable in the kernel control code.

When a user successfuly deletes a specific filter, by specifying the correct
tcm->tcm_handle, an event is generated to user space which indicates
which specific filter was deleted.

Before this patch, the "fh" value was sent to user space as the identity.
As an example what is shown in the sample bpf filter delete event above
is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
which happens to be a 64-bit memory address of the internal filter
representation (address of the corresponding filter's struct cls_bpf_prog);

After this patch the appropriate user identifiable handle as encoded
in the originating request tcm->tcm_handle is generated in the event.
One of the cardinal rules of netlink rules is to be able to take an
event (such as a delete in this case) and reflect it back to the
kernel and successfully delete the filter. This patch achieves that.

Note, this issue has existed since the original TC action
infrastructure code patch back in 2004 as found in:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/

[1] http://patchwork.ozlabs.org/patch/682828/
[2] http://patchwork.ozlabs.org/patch/682829/

Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bgmac: fix spelling mistake: "connecton" -> "connection"
Colin Ian King [Mon, 24 Oct 2016 22:46:18 +0000 (23:46 +0100)]
net: bgmac: fix spelling mistake: "connecton" -> "connection"

trivial fix to spelling mistake in dev_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoflow_dissector: fix vlan tag handling
Arnd Bergmann [Mon, 24 Oct 2016 21:40:30 +0000 (23:40 +0200)]
flow_dissector: fix vlan tag handling

gcc warns about an uninitialized pointer dereference in the vlan
priority handling:

net/core/flow_dissector.c: In function '__skb_flow_dissect':
net/core/flow_dissector.c:281:61: error: 'vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]

As pointed out by Jiri Pirko, the variable is never actually used
without being initialized first as the only way it end up uninitialized
is with skb_vlan_tag_present(skb)==true, and that means it does not
get accessed.

However, the warning hints at some related issues that I'm addressing
here:

- the second check for the vlan tag is different from the first one
  that tests the skb for being NULL first, causing both the warning
  and a possible NULL pointer dereference that was not entirely fixed.
- The same patch that introduced the NULL pointer check dropped an
  earlier optimization that skipped the repeated check of the
  protocol type
- The local '_vlan' variable is referenced through the 'vlan' pointer
  but the variable has gone out of scope by the time that it is
  accessed, causing undefined behavior

Caching the result of the 'skb && skb_vlan_tag_present(skb)' check
in a local variable allows the compiler to further optimize the
later check. With those changes, the warning also disappears.

Fixes: 3805a938a6c2 ("flow_dissector: Check skb for VLAN only if skb specified.")
Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ipv6: Do not consider link state for nexthop validation
David Ahern [Mon, 24 Oct 2016 19:27:23 +0000 (12:27 -0700)]
net: ipv6: Do not consider link state for nexthop validation

Similar to IPv4, do not consider link state when validating next hops.

Currently, if the link is down default routes can fail to insert:
 $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
 RTNETLINK answers: No route to host

With this patch the command succeeds.

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ipv6: Fix processing of RAs in presence of VRF
David Ahern [Mon, 24 Oct 2016 17:52:35 +0000 (10:52 -0700)]
net: ipv6: Fix processing of RAs in presence of VRF

rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
table from the device index, but the corresponding rt6_get_route_info
and rt6_get_dflt_router functions were not leading to the failure to
process RA's:

    ICMPv6: RA: ndisc_router_discovery failed to add default route

Fix the 'get' functions by using the table id associated with the
device when applicable.

Also, now that default routes can be added to tables other than the
default table, rt6_purge_dflt_routers needs to be updated as well to
look at all tables. To handle that efficiently, add a flag to the table
denoting if it is has a default route via RA.

Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agokalmia: avoid potential uninitialized variable use
Arnd Bergmann [Mon, 24 Oct 2016 15:54:18 +0000 (17:54 +0200)]
kalmia: avoid potential uninitialized variable use

The kalmia_send_init_packet() returns zero or a negative return
code, but gcc has no way of knowing that there cannot be a
positive return code, so it determines that copying the ethernet
address at the end of kalmia_bind() will access uninitialized
data:

drivers/net/usb/kalmia.c: In function ‘kalmia_bind’:
arch/x86/include/asm/string_32.h:78:22: error: ‘*((void *)&ethernet_addr+4)’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
   *((short *)to + 2) = *((short *)from + 2);
                      ^
drivers/net/usb/kalmia.c:138:5: note: ‘*((void *)&ethernet_addr+4)’ was declared here

This warning is harmless, but for consistency, we should make
the check for the return code match what the driver does everywhere
else and just progate it, which then gets rid of the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomacsec: Fix header length if SCI is added if explicitly disabled
Tobias Brunner [Mon, 24 Oct 2016 13:44:26 +0000 (15:44 +0200)]
macsec: Fix header length if SCI is added if explicitly disabled

Even if sending SCIs is explicitly disabled, the code that creates the
Security Tag might still decide to add it (e.g. if multiple RX SCs are
defined on the MACsec interface).
But because the header length so far only depended on the configuration
option the SCI overwrote the original frame's contents (EtherType and
e.g. the beginning of the IP header) and if encrypted did not visibly
end up in the packet, while the SC flag in the TCI field of the Security
Tag was still set, resulting in invalid MACsec frames.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoat803x: double check SGMII side autoneg
Zefir Kurtisi [Mon, 24 Oct 2016 10:40:54 +0000 (12:40 +0200)]
at803x: double check SGMII side autoneg

In SGMII mode, we observed an autonegotiation issue
after power-down-up cycles where the copper side
reports successful link establishment but the
SGMII side's link is down.

This happened in a setup where the at8031 is
connected over SGMII to a eTSEC (fsl gianfar),
but so far could not be reproduced with other
Ethernet device / driver combinations.

This commit adds a wrapper function for at8031
that in case of operating in SGMII mode double
checks SGMII link state when generic aneg_done()
succeeds. It prints a warning on failure but
intentionally does not try to recover from this
state. As a result, if you ever see a warning
'803x_aneg_done: SGMII link is not ok' you will
end up having an Ethernet link up but won't get
any data through. This should not happen, if it
does, please contact the module maintainer.

Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRevert "at803x: fix suspend/resume for SGMII link"
Zefir Kurtisi [Mon, 24 Oct 2016 10:40:53 +0000 (12:40 +0200)]
Revert "at803x: fix suspend/resume for SGMII link"

This reverts commit 98267311fe3b334ae7c107fa0e2413adcf3ba735.

Suspending the SGMII alongside the copper side
made the at803x inaccessable while powered down,
e.g. it can't be re-probed after suspend.

Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMAINTAINERS: Update qlogic networking drivers
Mintz, Yuval [Mon, 24 Oct 2016 05:48:09 +0000 (08:48 +0300)]
MAINTAINERS: Update qlogic networking drivers

Following Cavium's acquisition of qlogic we need to update all the qlogic
drivers maintainer's entries to point to our new e-mail addresses,
as well as update some of the driver's maintainers as those are no longer
working for Cavium.

I would like to thank Sony Chacko and Rajesh Borundia for their support
and development of our various networking drivers.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetvsc: fix incorrect receive checksum offloading
Stephen Hemminger [Mon, 24 Oct 2016 04:32:47 +0000 (21:32 -0700)]
netvsc: fix incorrect receive checksum offloading

The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.

This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: fix IP_CHECKSUM handling
Eric Dumazet [Mon, 24 Oct 2016 01:03:06 +0000 (18:03 -0700)]
udp: fix IP_CHECKSUM handling

First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));

Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.

Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: fix the panic caused by route update
Xin Long [Sun, 23 Oct 2016 17:01:09 +0000 (01:01 +0800)]
sctp: fix the panic caused by route update

Commit 7303a1475008 ("sctp: identify chunks that need to be fragmented
at IP level") made the chunk be fragmented at IP level in the next round
if it's size exceed PMTU.

But there still is another case, PMTU can be updated if transport's dst
expires and transport's pmtu_pending is set in sctp_packet_transmit. If
the new PMTU is less than the chunk, the same issue with that commit can
be triggered.

So we should drop this packet and let it retransmit in another round
where it would be fragmented at IP level.

This patch is to fix it by checking the chunk size after PMTU may be
updated and dropping this packet if it's size exceed PMTU.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodoc: update docbook annotations for socket and skb
Stephen Hemminger [Sun, 23 Oct 2016 16:28:29 +0000 (09:28 -0700)]
doc: update docbook annotations for socket and skb

The skbuff and sock structure both had missing parameter annotation
values.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorocker: fix error return code in rocker_world_check_init()
Wei Yongjun [Sat, 22 Oct 2016 14:31:06 +0000 (14:31 +0000)]
rocker: fix error return code in rocker_world_check_init()

Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: e420114eef4a ("rocker: introduce worlds infrastructure")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Sun, 23 Oct 2016 21:51:38 +0000 (17:51 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
pull request: bluetooth 2016-10-21

Here are some more Bluetooth fixes for the 4.9 kernel:

 - Fix to btwilink driver probe function return value
 - Power management fix to hci_bcm
 - Fix to encoding name in scan response data

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: sctp, forbid negative length
Jiri Slaby [Fri, 21 Oct 2016 12:13:24 +0000 (14:13 +0200)]
net: sctp, forbid negative length

Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
        if (len < sizeof(int))
                return -EINVAL;

On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.

Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.

Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.

If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
 [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
 [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
 [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
 [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fec: Call swap_buffer() prior to IP header alignment
Fabio Estevam [Fri, 21 Oct 2016 11:34:29 +0000 (09:34 -0200)]
net: fec: Call swap_buffer() prior to IP header alignment

Commit 3ac72b7b63d5 ("net: fec: align IP header in hardware") breaks
networking on mx28.

There is an erratum on mx28 (ENGR121613 - ENET big endian mode
not compatible with ARM little endian) that requires an additional
byte-swap operation to workaround this problem.

So call swap_buffer() prior to performing the IP header alignment
to restore network functionality on mx28.

Fixes: 3ac72b7b63d5 ("net: fec: align IP header in hardware")
Reported-and-tested-by: Henri Roosen <henri.roosen@ginzinger.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: do not increment mac header when it's unset
Jason A. Donenfeld [Fri, 21 Oct 2016 09:28:25 +0000 (18:28 +0900)]
ipv6: do not increment mac header when it's unset

Otherwise we'll overflow the integer. This occurs when layer 3 tunneled
packets are handed off to the IPv6 layer.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnx2x: Use the correct divisor value for PHC clock readings.
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 06:09:17 +0000 (02:09 -0400)]
bnx2x: Use the correct divisor value for PHC clock readings.

Time Sync (PTP) implementation uses the divisor/shift value for converting
the clock ticks to nanoseconds. Driver currently defines shift value as 1,
this results in the nanoseconds value to be calculated as half the actual
value. Hence the user application fails to synchronize the device clock
value with the PTP master device clock. Need to use the 'shift' value of 0.

Signed-off-by: Sony.Chacko <Sony.Chacko@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'qed-fixes'
David S. Miller [Sat, 22 Oct 2016 21:08:07 +0000 (17:08 -0400)]
Merge branch 'qed-fixes'

Sudarsana Reddy Kalluru says:

====================
qed*: fix series.

The patch series contains several minor bug fixes for qed/qede modules.
Please consider applying this to 'net' branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqede: Fix incorrrect usage of APIs for un-mapping DMA memory
Manish Chopra [Fri, 21 Oct 2016 08:43:45 +0000 (04:43 -0400)]
qede: Fix incorrrect usage of APIs for un-mapping DMA memory

Driver uses incorrect APIs to unmap DMA memory which were
mapped using dma_map_single(). This patch fixes it to use
appropriate APIs for un-mapping DMA memory.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Zero-out the buffer paased to dcbx_query() API
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:44 +0000 (04:43 -0400)]
qed: Zero-out the buffer paased to dcbx_query() API

qed_dcbx_query_params() implementation populate the values to input
buffer based on the dcbx mode and, the current negotiated state/params,
the caller of this API need to memset the buffer to zero.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqede: Reconfigure rss indirection direction table when rss count is updated
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:43 +0000 (04:43 -0400)]
qede: Reconfigure rss indirection direction table when rss count is updated

Rx indirection table entries are in the range [0, (rss_count - 1)]. If
user reduces the rss count, the table entries may not be in the ccorrect
range. Need to reconfigure the table with new rss_count as a basis.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed*: Reduce the memory footprint for Rx path
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:42 +0000 (04:43 -0400)]
qed*: Reduce the memory footprint for Rx path

With the current default values for Rx path i.e., 8 queues of 8Kb entries
each with 4Kb size, interface will consume 256Mb for Rx. The default values
causing the driver probe to fail when the system memory is low. Based on
the perforamnce results, rx-ring count value of 1Kb gives the comparable
performance with Rx coalesce timeout of 12 seconds. Updating the default
values.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqede: Loopback implementation should ignore the normal traffic
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:41 +0000 (04:43 -0400)]
qede: Loopback implementation should ignore the normal traffic

During the execution of loopback test, driver may receive the packets which
are not originated by this test, loopback implementation need to skip those
packets.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqede: Do not allow RSS config for 100G devices
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:40 +0000 (04:43 -0400)]
qede: Do not allow RSS config for 100G devices

RSS configuration is not supported for 100G adapters.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqede: get_channels() need to populate max tx/rx coalesce values
Sudarsana Reddy Kalluru [Fri, 21 Oct 2016 08:43:39 +0000 (04:43 -0400)]
qede: get_channels() need to populate max tx/rx coalesce values

Recent changes in kernel ethtool implementation requires the driver
callback for get_channels() has to populate the values for max tx/rx
coalesce fields.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: use the right lock for ping_group_range
WANG Cong [Thu, 20 Oct 2016 21:19:46 +0000 (14:19 -0700)]
ipv4: use the right lock for ping_group_range

This reverts commit a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).

Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.

We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.

Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'dsa-bcm_sf2-do-not-rely-on-kexec_in_progress'
David S. Miller [Sat, 22 Oct 2016 20:17:54 +0000 (16:17 -0400)]
Merge branch 'dsa-bcm_sf2-do-not-rely-on-kexec_in_progress'

Florian Fainelli says:

====================
net: dsa: bcm_sf2: Do not rely on kexec_in_progress

These are the two patches following the discussing we had on kexec_in_progress.

Feel free to apply or discard them, thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: bcm_sf2: Do not rely on kexec_in_progress
Florian Fainelli [Fri, 21 Oct 2016 21:21:56 +0000 (14:21 -0700)]
net: dsa: bcm_sf2: Do not rely on kexec_in_progress

After discussing with Eric, it turns out that, while using
kexec_in_progress is a nice optimization, which prevents us from always
powering on the integrated PHY, let's just turn it on in the shutdown
path.

This removes a dependency on kexec_in_progress which, according to Eric
should not be used by modules

Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoRevert "kexec: Export kexec_in_progress to modules"
Florian Fainelli [Fri, 21 Oct 2016 21:21:55 +0000 (14:21 -0700)]
Revert "kexec: Export kexec_in_progress to modules"

This reverts commit 97dcaa0fcfd24daa9a36c212c1ad1d5a97759212. Based on
the review discussion with Eric, we will come up with a different fix
for the bcm_sf2 driver which does not make it rely on the
kexec_in_progress value.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetns: revert "netns: avoid disabling irq for netns id"
Paul Moore [Sat, 22 Oct 2016 01:49:14 +0000 (21:49 -0400)]
netns: revert "netns: avoid disabling irq for netns id"

This reverts commit bc51dddf98c9 ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:

 * http://marc.info/?t=147694653900002&r=1&w=2

Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages.  A tracking issue for this can be
found below:

 * https://github.com/linux-audit/audit-kernel/issues/23

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Elad Raz <e@eladraz.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: fix a potential deadlock in do_ipv6_setsockopt()
WANG Cong [Thu, 20 Oct 2016 06:35:12 +0000 (23:35 -0700)]
ipv6: fix a potential deadlock in do_ipv6_setsockopt()

Baozeng reported this deadlock case:

       CPU0                    CPU1
       ----                    ----
  lock([  165.136033] sk_lock-AF_INET6);
                               lock([  165.136033] rtnl_mutex);
                               lock([  165.136033] sk_lock-AF_INET6);
  lock([  165.136033] rtnl_mutex);

Similar to commit 87e9f0315952
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.

Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Fri, 21 Oct 2016 14:25:22 +0000 (10:25 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
   Geert Uytterhoeven.

2) Fix wrong timeout in set elements added from packet path via
   nft_dynset, from Anders K. Pedersen.

3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
   documentation, from Nicolas Dichtel.

4) Ensure proper initialization of log flags via xt_LOG, from
   Liping Zhang.

5) Missing alias to autoload ipcomp, also from Liping Zhang.

6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.

7) Wrong integer type in the new nft_parse_u32_check() function,
   from Dan Carpenter.

8) Another wrong integer type declaration in nft_exthdr_init, also
   from Dan Carpenter.

9) Fix insufficient mode validation in nft_range.

10) Fix compilation warning in nft_range due to possible uninitialized
    value, from Arnd Bergmann.

11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
    calm down kmemcheck, from Florian Westphal.

12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
    from Nicolas Dichtel.

13) Fix nf_queue() after conversion to single-linked hook list, related
    to incorrect bypass flag handling and incorrect hook point of
    reinjection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agokexec: Export kexec_in_progress to modules
Florian Fainelli [Fri, 21 Oct 2016 01:15:16 +0000 (18:15 -0700)]
kexec: Export kexec_in_progress to modules

The bcm_sf2 driver uses kexec_in_progress to know whether it can power
down an integrated PHY during shutdown, and can be built as a module.
Other modules may be using this in the future, so export it.

Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: disable BH in set_ping_group_range()
Eric Dumazet [Thu, 20 Oct 2016 17:26:48 +0000 (10:26 -0700)]
ipv4: disable BH in set_ping_group_range()

In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()

Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Salo <salo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: must lock the socket in udp_disconnect()
Eric Dumazet [Thu, 20 Oct 2016 16:39:40 +0000 (09:39 -0700)]
udp: must lock the socket in udp_disconnect()

Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.

A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.

I could write a reproducer with two threads doing :

static int sock_fd;
static void *thr1(void *arg)
{
for (;;) {
connect(sock_fd, (const struct sockaddr *)arg,
sizeof(struct sockaddr_in));
}
}

static void *thr2(void *arg)
{
struct sockaddr_in unspec;

for (;;) {
memset(&unspec, 0, sizeof(unspec));
        connect(sock_fd, (const struct sockaddr *)&unspec,
sizeof(unspec));
        }
}

Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels
Florian Fainelli [Thu, 20 Oct 2016 16:32:19 +0000 (09:32 -0700)]
net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels

For a kernel that is being kexec'd we re-enable the integrated GPHY in
order for the subsequent MDIO bus scan to succeed and properly bind to
the bcm7xxx PHY driver. If we did not do that, the GPHY would be shut
down by the time the MDIO driver is probing the bus, and it would fail
to read the correct PHY OUI and therefore bind to an appropriate PHY
driver. Later on, this would cause DSA not to be able to successfully
attach to the PHY, and the interface would not be created at all.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, test: fix ld_abs + vlan push/pop stress test
Daniel Borkmann [Thu, 20 Oct 2016 15:13:53 +0000 (17:13 +0200)]
bpf, test: fix ld_abs + vlan push/pop stress test

After commit 636c2628086e ("net: skbuff: Remove errornous length
validation in skb_vlan_pop()") mentioned test case stopped working,
throwing a -12 (ENOMEM) return code. The issue however is not due to
636c2628086e, but rather due to a buggy test case that got uncovered
from the change in behaviour in 636c2628086e.

The data_size of that test case for the skb was set to 1. In the
bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
loop with: reading skb data, pushing 68 tags, reading skb data,
popping 68 tags, reading skb data, etc, in order to force a skb
expansion and thus trigger that JITs recache skb->data. Problem is
that initial data_size is too small.

While before 636c2628086e, the test silently bailed out due to the
skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
error from failing skb_ensure_writable(). Set at least minimum of
ETH_HLEN as an initial length so that on first push of data, equivalent
pop will succeed.

Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: add recursion limit to GRO
Sabrina Dubroca [Thu, 20 Oct 2016 13:58:02 +0000 (15:58 +0200)]
net: add recursion limit to GRO

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: properly prevent temp_prefered_lft sysctl race
Jiri Bohac [Thu, 20 Oct 2016 10:29:26 +0000 (12:29 +0200)]
ipv6: properly prevent temp_prefered_lft sysctl race

The check for an underflow of tmp_prefered_lft is always false
because tmp_prefered_lft is unsigned. The intention of the check
was to guard against racing with an update of the
temp_prefered_lft sysctl, potentially resulting in an underflow.

As suggested by David Miller, the best way to prevent the race is
by reading the sysctl variable using READ_ONCE.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetfilter: fix nf_queue handling
Pablo Neira Ayuso [Mon, 17 Oct 2016 17:05:32 +0000 (18:05 +0100)]
netfilter: fix nf_queue handling

nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
list_head with single linked list") for two reasons:

1) If the bypass flag is set on, there are no userspace listeners and
   we still have more hook entries to iterate over, then jump to the
   next hook. Otherwise accept the packet. On nf_reinject() path, the
   okfn() needs to be invoked.

2) We should not re-enter the same hook on packet reinjection. If the
   packet is accepted, we have to skip the current hook from where the
   packet was enqueued, otherwise the packets gets enqueued over and
   over again.

This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agonetfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached
Nicolas Dichtel [Tue, 18 Oct 2016 12:37:32 +0000 (14:37 +0200)]
netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached

When the maximum evictions number is reached, do not wait 5 seconds before
the next run.

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agostmmac: display the descriptors if DES0 = 0
Giuseppe CAVALLARO [Thu, 20 Oct 2016 08:01:28 +0000 (10:01 +0200)]
stmmac: display the descriptors if DES0 = 0

It makes sense to display the descriptors even if
DES0 is zero. This helps for example in case of it
is needed to dump rx write-back descriptors to get
timestamp status.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'ncsi-fixes'
David S. Miller [Thu, 20 Oct 2016 15:23:08 +0000 (11:23 -0400)]
Merge branch 'ncsi-fixes'

Gavin Shan says:

====================
net/ncsi: More bug fixes

This series fixes 2 issues that were found during NCSI's availability
testing on BCM5718 and improves HNCDSC AEN handler:

   * PATCH[1] refactors the code so that minimal code change is put
     to PATCH[2].
   * PATCH[2] fixes the NCSI channel's stale link state before doing
     failover.
   * PATCH[3] chooses the hot channel, which was ever chosen as active
     channel, when the available channels are all in link-down state.
   * PATCH[4] improves Host Network Controller Driver Status Change
     (HNCDSC) AEN handler

Changelog
=========
v2:
   * Merged PATCH[v1 1/2] to PATCH[v2 1].
   * Avoid if/else statements in ncsi_suspend_channel() as Joel suggested.
   * Added comments to explain why we need retrieve last link states in
     ncsi_suspend_channel().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Improve HNCDSC AEN handler
Gavin Shan [Thu, 20 Oct 2016 00:45:52 +0000 (11:45 +1100)]
net/ncsi: Improve HNCDSC AEN handler

This improves AEN handler for Host Network Controller Driver Status
Change (HNCDSC):

   * The channel's lock should be hold when accessing its state.
   * Do failover when host driver isn't ready.
   * Configure channel when host driver becomes ready.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Choose hot channel as active one if necessary
Gavin Shan [Thu, 20 Oct 2016 00:45:51 +0000 (11:45 +1100)]
net/ncsi: Choose hot channel as active one if necessary

The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. C0 is in link-up state while C1 is in link-down
state. C0 is chosen as active channel until unplugging and plugging
C0's cable:  On unplugging C0's cable, LSC (Link State Change) AEN
packet received on C0 to report link-down event. After that, C1 is
chosen as active channel. LSC AEN for link-up event is lost on C0
when plugging C0's cable back. We lose the network even C0 is usable.

This resolves the issue by recording the (hot) channel that was ever
chosen as active one. The hot channel is chosen to be active one
if none of available channels in link-up state. With this, C0 is still
the active one after unplugging C0's cable. LSC AEN packet received
on C0 when plugging its cable back.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Fix stale link state of inactive channels on failover
Gavin Shan [Thu, 20 Oct 2016 00:45:50 +0000 (11:45 +1100)]
net/ncsi: Fix stale link state of inactive channels on failover

The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. Both of them are connected to different LANs,
means they are in link-up state and C0 is chosen as the active one
until resetting BCM5718 happens as below.

Resetting BCM5718 results in LSC (Link State Change) AEN packet
received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet
received on C0 to report link-down, it fails over to C1 because C1
is in link-up state as software can see. However, C1 is in link-down
state in hardware. It means the link state is out of synchronization
between hardware and software, resulting in inappropriate channel (C1)
selected as active one.

This resolves the issue by sending separate GLS (Get Link Status)
commands to all channels in the package before trying to do failover.
The last link states of all channels in the package are retrieved.
With it, C0 (not C1) is selected as active one as expected.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ncsi: Avoid if statements in ncsi_suspend_channel()
Gavin Shan [Thu, 20 Oct 2016 00:45:49 +0000 (11:45 +1100)]
net/ncsi: Avoid if statements in ncsi_suspend_channel()

There are several if/else statements in the state machine implemented
by switch/case in ncsi_suspend_channel() to avoid duplicated code. It
makes the code a bit hard to be understood.

This drops if/else statements in ncsi_suspend_channel() to improve the
code readability as Joel Stanley suggested. Also, it becomes easy to
add more states in the state machine without affecting current code.
No logical changes introduced by this.

Suggested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/sched: act_mirred: Use passed lastuse argument
Paul Blakey [Wed, 19 Oct 2016 14:42:39 +0000 (17:42 +0300)]
net/sched: act_mirred: Use passed lastuse argument

stats_update callback is called by NIC drivers doing hardware
offloading of the mirred action. Lastuse is passed as argument
to specify when the stats was actually last updated and is not
always the current time.

Fixes: 9798e6fe4f9b ('net: act_mirred: allow statistic updates from offloaded actions')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: pci: Fix reset wait for SwitchX2
Jiri Pirko [Thu, 20 Oct 2016 14:05:45 +0000 (16:05 +0200)]
mlxsw: pci: Fix reset wait for SwitchX2

SwitchX2 firmware does not implement reset done yet. Moreover, when
busy-polled for ready magic, that slows down firmware and reset takes
longer than the defined timeout, causing initialization to fail.
So restore the previous behaviour and just sleep in this case.

Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: switchx2: Fix ethernet port initialization
Elad Raz [Thu, 20 Oct 2016 14:05:44 +0000 (16:05 +0200)]
mlxsw: switchx2: Fix ethernet port initialization

When creating an ethernet port fails, we must move the port to disable,
otherwise putting the port in switch partition 0 (ETH) or 1 (IB) will
always fails.

Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove warn
Jiri Pirko [Thu, 20 Oct 2016 14:05:43 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove warn

The function return value is not checked anywhere. Also, the warning
causes huge slowdown when removing large number of FIB entries which
were not offloaded, because of ordering issue. Ido's preparing
a patchset to fix the ordering issue, but that is definitelly not
net tree material.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_router: Use correct tree index for binding
Jiri Pirko [Thu, 20 Oct 2016 14:05:42 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Use correct tree index for binding

By a mistake, there is tree index 0 passed to RALTB. Should be
MLXSW_SP_LPM_TREE_MIN.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoBluetooth: btwilink: Fix probe return value
Jacob Siverskog [Thu, 20 Oct 2016 07:05:09 +0000 (09:05 +0200)]
Bluetooth: btwilink: Fix probe return value

Probe functions should return 0 on success. This driver's probe
returns the value returned by hci_register_dev(), which is the hci
index. This works for systems with only one hci device (id = 0) but
for systems where the btwilink device ends up with an id larger than
0, things will start to fall apart.

Make the probe function return 0 on success.

Signed-off-by: Jacob Siverskog <jacob@teenage.engineering>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
8 years agostmmac: fix and review the ptp registration.
Giuseppe CAVALLARO [Wed, 19 Oct 2016 07:06:41 +0000 (09:06 +0200)]
stmmac: fix and review the ptp registration.

The commit commit 7086605a6ab5 ("stmmac: fix error check when init ptp")
breaks the procedure added by the
commit efee95f42b5d ("ptp_clock: future-proofing drivers against PTP
subsystem becoming optional")

So this patch tries to re-import the logic added by the latest
commit above: it makes sense to have the stmmac_ptp_register
as void function and, inside the main, the stmmac_init_ptp can fails
in case of the capability cannot be supported by the HW.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Cc: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoBluetooth: Fix append max 11 bytes of name to scan rsp data
Michał Narajowski [Wed, 19 Oct 2016 08:20:27 +0000 (10:20 +0200)]
Bluetooth: Fix append max 11 bytes of name to scan rsp data

Append maximum of 10 + 1 bytes of name to scan response data.
Complete name is appended only if exists and is <= 10 characters.
Else append short name if exists or shorten complete name if not.
This makes sure name is consistent across multiple advertising
instances.

Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
8 years agonetfilter: x_tables: suppress kmemcheck warning
Florian Westphal [Mon, 17 Oct 2016 19:50:23 +0000 (21:50 +0200)]
netfilter: x_tables: suppress kmemcheck warning

Markus Trippelsdorf reports:

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
 ^
|RIP: 0010:[<ffffffff8166e561>]  [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
[..]
 [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0
 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110
[..]

This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agotcp: do not export sysctl_tcp_low_latency
Eric Dumazet [Tue, 18 Oct 2016 20:24:07 +0000 (13:24 -0700)]
tcp: do not export sysctl_tcp_low_latency

Since commit b2fb4f54ecd4 ("tcp: uninline tcp_prequeue()") we no longer
access sysctl_tcp_low_latency from a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agortnetlink: Add rtnexthop offload flag to compare mask
Jiri Pirko [Tue, 18 Oct 2016 16:59:34 +0000 (18:59 +0200)]
rtnetlink: Add rtnexthop offload flag to compare mask

The offload flag is a status flag and should not be used by
FIB semantics for comparison.

Fixes: 37ed9493699c ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoswitchdev: Execute bridge ndos only for bridge ports
Ido Schimmel [Tue, 18 Oct 2016 16:50:23 +0000 (18:50 +0200)]
switchdev: Execute bridge ndos only for bridge ports

We recently got the following warning after setting up a vlan device on
top of an offloaded bridge and executing 'bridge link':

WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[...]
 CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
  0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
  0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
  0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
 Call Trace:
  [<ffffffff8135eaa3>] dump_stack+0x63/0x90
  [<ffffffff8108c43b>] __warn+0xcb/0xf0
  [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
  [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
  [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
  [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
  [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
  [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
  [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
  [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
  [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
  [<ffffffff8161a1b2>] netlink_dump+0x122/0x290
  [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
  [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
  [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
  [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
  [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
  [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
  [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
  [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
  [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240
  [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
  [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50
  [<ffffffff815c6031>] SYSC_sendto+0x101/0x190
  [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
  [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10
  [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

The problem is that the 8021q module propagates the call to
ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
recognize the netdev, as it's not offloaded.

While we can ignore calls being made to non-bridge ports inside the
driver, a better fix would be to push this check up to the switchdev
layer.

Note that these ndos can be called for non-bridged netdev, but this only
happens in certain PF drivers which don't call the corresponding
switchdev functions anyway.

Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: core: Correctly iterate over lower adjacency list
Ido Schimmel [Wed, 19 Oct 2016 13:57:08 +0000 (16:57 +0300)]
net: core: Correctly iterate over lower adjacency list

Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
 task: ffff88017edfea40 task.stack: ffff88017ee10000
 RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
 Call Trace:
  <IRQ>
  [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
  [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
  [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
  [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
  [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
  [<ffffffff815f2eb6>] neigh_update+0x306/0x740
  [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
  [<ffffffff8165ea3f>] arp_process+0x66f/0x700
  [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
  [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
  [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
  [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
  [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
  [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
  [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
  [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
  [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
  [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
  [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
  [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
  [<ffffffff81374815>] ? find_next_bit+0x15/0x20
  [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
  [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
  [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
  [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
  [<ffffffff81091986>] tasklet_action+0xf6/0x110
  [<ffffffff81704556>] __do_softirq+0xf6/0x280
  [<ffffffff8109213f>] irq_exit+0xdf/0xf0
  [<ffffffff817042b4>] do_IRQ+0x54/0xd0
  [<ffffffff8170214c>] common_interrupt+0x8c/0x8c

The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.

Fix this by advancing the iterator and avoid the infinite loop.

Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoflow_dissector: Check skb for VLAN only if skb specified.
Eric Garver [Mon, 17 Oct 2016 20:30:12 +0000 (16:30 -0400)]
flow_dissector: Check skb for VLAN only if skb specified.

Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.

Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoqed: Use list_move_tail instead of list_del/list_add_tail
Wei Yongjun [Mon, 17 Oct 2016 15:17:51 +0000 (15:17 +0000)]
qed: Use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agorocker: fix maybe-uninitialized warning
Arnd Bergmann [Mon, 17 Oct 2016 22:16:15 +0000 (00:16 +0200)]
rocker: fix maybe-uninitialized warning

In some rare configurations, we get a warning about the 'index' variable
being used without an initialization:

drivers/net/ethernet/rocker/rocker_ofdpa.c: In function ‘ofdpa_port_fib_ipv4.isra.16.constprop’:
drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This is a false positive, the logic is just a bit too complex for gcc
to follow here. Moving the intialization of 'index' a little further
down makes it clear to gcc that the function always returns an error
if it is not initialized.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/hyperv: avoid uninitialized variable
Arnd Bergmann [Mon, 17 Oct 2016 22:16:09 +0000 (00:16 +0200)]
net/hyperv: avoid uninitialized variable

The hdr_offset variable is only if we deal with a TCP or UDP packet,
but as the check surrounding its usage tests for skb_is_gso()
instead, the compiler has no idea if the variable is initialized
or not at that point:

drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’:
drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

This adds an additional check for the transport type, which
tells the compiler that this path cannot happen. Since the
get_net_transport_info() function should always be inlined
here, I don't expect this to result in additional runtime
checks.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bcm63xx: avoid referencing uninitialized variable
Arnd Bergmann [Mon, 17 Oct 2016 22:16:08 +0000 (00:16 +0200)]
net: bcm63xx: avoid referencing uninitialized variable

gcc found a reference to an uninitialized variable in the error handling
of bcm_enet_open, introduced by a recent cleanup:

drivers/net/ethernet/broadcom/bcm63xx_enet.c: In function 'bcm_enet_open'
drivers/net/ethernet/broadcom/bcm63xx_enet.c:1129:2: warning: 'phydev' may be used uninitialized in this function [-Wmaybe-uninitialized]

This makes the use of that variable conditional, so we only reference it
here after it has been used before. Unlike my normal patches, I have not
build-tested this one, as I don't currently have mips test in my
randconfig setup.

Fixes: 625eb8667d6f ("net: ethernet: broadcom: bcm63xx: use phydev from struct net_device")
Cc: Philippe Reynes <tremyfr@gmail.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosoreuseport: do not export reuseport_add_sock()
Eric Dumazet [Mon, 17 Oct 2016 21:22:48 +0000 (14:22 -0700)]
soreuseport: do not export reuseport_add_sock()

reuseport_add_sock() is not used from a module,
no need to export it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoibmvnic: Update MTU after device initialization
Thomas Falcon [Mon, 17 Oct 2016 20:28:10 +0000 (15:28 -0500)]
ibmvnic: Update MTU after device initialization

It is possible for the MTU to be changed during the initialization
process with the VNIC Server.  Ensure that the net device is updated
to reflect the new MTU.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoibmvnic: Fix GFP_KERNEL allocation in interrupt context
Thomas Falcon [Mon, 17 Oct 2016 20:28:09 +0000 (15:28 -0500)]
ibmvnic: Fix GFP_KERNEL allocation in interrupt context

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoibmvnic: Driver Version 1.0.1
Thomas Falcon [Mon, 17 Oct 2016 20:56:29 +0000 (15:56 -0500)]
ibmvnic: Driver Version 1.0.1

Increment driver version to reflect features that have
been added since release.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobridge: multicast: restore perm router ports on multicast enable
Nikolay Aleksandrov [Tue, 18 Oct 2016 16:09:48 +0000 (18:09 +0200)]
bridge: multicast: restore perm router ports on multicast enable

Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.

Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>

At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.

After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1

Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit 9aa66382163e7 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")

Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetfilter: nf_tables: avoid uninitialized variable warning
Arnd Bergmann [Mon, 17 Oct 2016 22:05:30 +0000 (00:05 +0200)]
netfilter: nf_tables: avoid uninitialized variable warning

The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:

net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.

Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
8 years agotcp: Remove unused but set variable
Tobias Klauser [Tue, 18 Oct 2016 09:22:54 +0000 (11:22 +0200)]
tcp: Remove unused but set variable

Remove the unused but set variable icsk in listening_get_next to fix the
following GCC warning when building with 'W=1':

  net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’:
  net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4: Fix number of queue sets corssing the limit
Ganesh Goudar [Tue, 18 Oct 2016 08:51:25 +0000 (14:21 +0530)]
cxgb4: Fix number of queue sets corssing the limit

Do not let number of offload queue sets to go more than
MAX_OFLD_QSETS, which would otherwise crash the driver
on machines with cores more than MAX_OFLD_QSETS.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv4: Remove unused but set variable
Tobias Klauser [Tue, 18 Oct 2016 07:40:20 +0000 (09:40 +0200)]
ipv4: Remove unused but set variable

Remove the unused but set variable dev in ip_do_fragment to fix the
following GCC warning when building with 'W=1':

  net/ipv4/ip_output.c: In function ‘ip_do_fragment’:
  net/ipv4/ip_output.c:541:21: warning: variable ‘dev’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>