Ido Schimmel [Thu, 2 Nov 2017 16:14:08 +0000 (17:14 +0100)]
mlxsw: reg: Add Router ECMP Configuration Register Version 2
The RECRv2 register is used for setting up the router's ECMP hash
configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 2 Nov 2017 16:14:07 +0000 (17:14 +0100)]
mlxsw: spectrum_router: Properly name netevent work struct
The struct containing the work item queued from the netevent handler is
named after the only event it is currently used for, which is neighbour
updates.
Use a more appropriate name for the struct, as we are going to use it
for more events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 2 Nov 2017 16:14:06 +0000 (17:14 +0100)]
mlxsw: spectrum_router: Embed netevent notifier block in router struct
We are going to need to respond to netevents notifying us about
multipath hash updates by configuring the device's hash parameters.
Embed the netevent notifier in the router struct so that we could
retrieve it upon notifications and use it to configure the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 2 Nov 2017 16:14:05 +0000 (17:14 +0100)]
ipv4: Send a netevent whenever multipath hash policy is changed
Devices performing IPv4 forwarding need to update their multipath hash
policy whenever it is changed.
Inform these devices by generating a netevent.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 2 Nov 2017 11:15:07 +0000 (11:15 +0000)]
cxgb4: fix error return code in cxgb4_set_hash_filter()
Fix to return a negative error code from thecxgb4_alloc_atid()
error handling case instead of 0.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-By: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 2 Nov 2017 11:05:52 +0000 (12:05 +0100)]
bpf: fix out-of-bounds access warning in bpf_check
The bpf_verifer_ops array is generated dynamically and may be
empty depending on configuration, which then causes an out
of bounds access:
kernel/bpf/verifier.c: In function 'bpf_check':
kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds]
This adds a check to the start of the function as a workaround.
I would assume that the function is never called in that configuration,
so the warning is probably harmless.
Fixes: 00176a34d9e2 ("bpf: remove the verifier ops from program structure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 2 Nov 2017 11:05:51 +0000 (12:05 +0100)]
bpf: fix link error without CONFIG_NET
I ran into this link error with the latest net-next plus linux-next
trees when networking is disabled:
kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops'
kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops'
It seems that the code was written to deal with varying contents of
the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops
and xdp_analyzer_ops are defined in the core networking code, so adding
a check for CONFIG_NET seems appropriate here, and I've verified this with
many randconfig builds
Fixes: 4f9218aaf8a4 ("bpf: move knowledge about post-translation offsets out of verifier")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Thu, 2 Nov 2017 09:36:48 +0000 (10:36 +0100)]
net: Define eth_stp_addr in linux/etherdevice.h
The lan9303 driver defines eth_stp_addr as a synonym to
eth_reserved_addr_base to get the STP ethernet address 01:80:c2:00:00:00.
eth_reserved_addr_base is also used to define the start of Bridge Reserved
ethernet address range, which happen to be the STP address.
br_dev_setup refer to eth_reserved_addr_base as a definition of STP
address.
Clean up by:
- Move the eth_stp_addr definition to linux/etherdevice.h
- Use eth_stp_addr instead of eth_reserved_addr_base in br_dev_setup.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Thu, 2 Nov 2017 01:14:49 +0000 (18:14 -0700)]
liquidio: bump up driver version to 1.7.0 to match newer NIC firmware
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 30 Oct 2017 21:41:35 +0000 (14:41 -0700)]
tcp: add tracepoint trace_tcp_retransmit_synack()
This tracepoint can be used to trace synack retransmits. It maintains
pointer to struct request_sock.
We cannot simply reuse trace_tcp_retransmit_skb() here, because the
sk here is the LISTEN socket. The IP addresses and ports should be
extracted from struct request_sock.
Note that, like many other tracepoints, this patch uses IS_ENABLED
in TP_fast_assign macro, which triggers sparse warning like:
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list
However, there is no good solution to avoid these warnings. To the
best of our knowledge, these warnings are harmless.
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 30 Oct 2017 21:16:00 +0000 (14:16 -0700)]
ipv6: Implement limits on Hop-by-Hop and Destination options
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).
By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.
25.38% [kernel] [k] __fib6_clean_all
21.63% [kernel] [k] ip6_parse_tlv
4.21% [kernel] [k] __local_bh_enable_ip
2.18% [kernel] [k] ip6_pol_route.isra.39
1.98% [kernel] [k] fib6_walk_continue
1.88% [kernel] [k] _raw_write_lock_bh
1.65% [kernel] [k] dst_release
This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
- Limit the number of options in a Hop-by-Hop or Destination options
extension header.
- Limit the byte length of a Hop-by-Hop or Destination options
extension header.
- Disallow unrecognized options in a Hop-by-Hop or Destination
options extension header.
The limits are set in corresponding sysctls:
ipv6.sysctl.max_dst_opts_cnt
ipv6.sysctl.max_hbh_opts_cnt
ipv6.sysctl.max_dst_opts_len
ipv6.sysctl.max_hbh_opts_len
If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.
If a limit is exceeded when processing an extension header the packet is
dropped.
Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).
These limits have being proposed in draft-ietf-6man-rfc6434-bis.
Tested (by Martin Lau)
I tested out 1 thread (i.e. one raw_udp process).
I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.
v2;
- Code and documention cleanup.
- Change references of RFC2460 to be RFC8200.
- Add reference to RFC6434-bis where the limits will be in standard.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 12:28:36 +0000 (21:28 +0900)]
Merge branch 'hns3-add-support-for-reset'
Lipeng says:
====================
net: hns3: add support for reset
There are 4 reset types for HNS3 PF driver, include global reset,
core reset, IMP reset, PF reset.The core reset will reset all datapath
of all functions except IMP, MAC and PCI interface. Global reset is equal
with the core reset plus all MAC reset. IMP reset is caused by watchdog
timer expiration, the same range with core reset. PF reset will reset
whole physical function.
This patchset adds reset support for hns3 driver and fix some related bugs.
---
Change log:
V1 -> V2:
1, fix some comments from Yunsheng Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qumingguang [Thu, 2 Nov 2017 12:45:23 +0000 (20:45 +0800)]
net: hns3: hns3:fix a bug about statistic counter in reset process
All member of Struct hdev->hw_stats is initialized to 0 as hdev is
allocated by devm_kzalloc. But in reset process, hdev will not be
allocated again, so need clear hdev->hw_stats in reset process, otherwise
the statistic will be wrong after reset. This patch set all of the
statistic counters to zero after reset.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qumingguang [Thu, 2 Nov 2017 12:45:22 +0000 (20:45 +0800)]
net: hns3: Fix a misuse to devm_free_irq
we should use free_irq to free the nic irq during the unloading time.
because we use request_irq to apply it when nic up. It will crash if
up net device after reset the port. This patch fixes the issue.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:21 +0000 (20:45 +0800)]
net: hns3: Add reset interface implementation in client
This patch implement the interface of reset notification in hns3_enet,
it will do resetting business which include shutdown nic device,
free and initialize client side resource.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:20 +0000 (20:45 +0800)]
net: hns3: Add timeout process in hns3_enet
This patch add timeout handler in hns3_enet.c to handle
TX side timeout event, when TX timeout event occur, it will triger
NIC driver into reset process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:19 +0000 (20:45 +0800)]
net: hns3: Add reset process in hclge_main
This patch adds reset support for PF,it include : global reset, core reset,
IMP reset, PF reset.The core reset will Reset all datapath of all functions
except IMP, MAC and PCI interface. Global reset is equal with the core
reset plus all MAC reset. IMP reset is caused by watchdog timer expiration,
the same with core reset in the reset flow. PF reset will reset whole
physical function.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:18 +0000 (20:45 +0800)]
net: hns3: Add support for misc interrupt
This patch adds initialization and deinitialization for misc interrupt.
This interrupt will be used to handle reset message(IRQ).
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:17 +0000 (20:45 +0800)]
net: hns3: Refactor the initialization of command queue
There is no necessary to reallocate the descriptor and remap the descriptor
memory in reset process, But there is still some other action exist in both
reset process and initialization process.
To reuse the common interface in reset process and initialization process,
This patch moves out the descriptor allocate and memory maping from
interface cmdq_init.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qumingguang [Thu, 2 Nov 2017 12:45:16 +0000 (20:45 +0800)]
net: hns3: Refactor mac_init function
It needs initialize mdio in initialization process, but reset process
does not reset mdio, so do not initialize mdio in reset process.
This patch move out the mdio configuration function from the mac_init.
So mac_init can be used both in reset process and initialization process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Thu, 2 Nov 2017 12:45:15 +0000 (20:45 +0800)]
net: hns3: Refactor the mapping of tqp to vport
This patch refactor the mapping of tqp to vport, making the maping function
can be used both in the reset process and initialization process.
Signed-off-by: qumingguang <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 2 Nov 2017 11:13:36 +0000 (12:13 +0100)]
net: seeq: fix timer conversion
One of the timer conversion patches evidently escaped build testing
until I ran into in on ARM randconfig builds:
drivers/net/ethernet/seeq/ether3.c: In function 'ether3_ledoff':
drivers/net/ethernet/seeq/ether3.c:175:40: error: 'priv' undeclared (first use in this function); did you mean 'pid'?
drivers/net/ethernet/seeq/ether3.c:176:27: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
This fixes the two small typos that caused the problems.
Fixes: 6fd9c53f7186 ("net: seeq: Convert timers to use timer_setup()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Thu, 2 Nov 2017 09:20:58 +0000 (10:20 +0100)]
net: dsa: lan9303: Added Documentation/networking/dsa/lan9303.txt
Provide a rough overview of the state of the driver. And explain that the
driver operates in two modes: bridged and port-separated.
Signed-off-by: Egil Hjelmeland <egil.hjelmeland@zenitel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 11:27:12 +0000 (20:27 +0900)]
Merge branch 'nfp-TC-block-fixes-app-fallback-and-dev_alloc'
Jakub Kicinski says:
====================
nfp: TC block fixes, app fallback and dev_alloc()
This series has three parts. First of all John and I fix some
fallout from the TC block conversion. John also fixes sleeping
in the neigh notifier.
Secondly I reorganise the nfp_app table to make it easier to
deal with excluding apps which have unmet Kconfig dependencies.
Last but not least after the fixes which went into -net some time
ago I refactor the page allocation, add a ethtool counter for
failed allocations and clean the ethtool stat code while at it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:36 +0000 (01:31 -0700)]
nfp: improve defines for constants in ethtool
We split rvector stats into two categories - per queue and
stats which are added up into one total counter. Improve
the defines denoting their number.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:35 +0000 (01:31 -0700)]
nfp: use a counter instead of log message for allocation failures
Add a counter incremented when allocation of replacement
RX page fails.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:34 +0000 (01:31 -0700)]
nfp: switch to dev_alloc_page()
Use the dev_alloc_page() networking helper to allocate pages
for RX packets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:33 +0000 (01:31 -0700)]
nfp: bpf: fall back to core NIC app if BPF not selected
If kernel config does not include BPF just replace the BPF
app handler with the handler for basic NIC. The BPF app
will now be built only if BPF infrastructure is selected
in kernel config.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:32 +0000 (01:31 -0700)]
nfp: reorganize the app table
The app table is an unordered array right now. We have to search
apps by ID. It also makes it harder to fall back to core NIC if
advanced functions are not compiled into the kernel (e.g. eBPF).
Make the table keyed by app id.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Nov 2017 08:31:31 +0000 (01:31 -0700)]
nfp: bpf: reject TC offload if XDP loaded
Recent TC changes dropped the check protecting us from trying
to offload a TC program if XDP programs are already loaded.
Fixes: 90d97315b3e7 ("nfp: bpf: Convert ndo_setup_tc offloads to block callbacks")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Thu, 2 Nov 2017 08:31:30 +0000 (01:31 -0700)]
nfp: flower: vxlan - ensure no sleep in atomic context
Functions called by the netevent notifier must be in atomic context.
Change the mutex to spinlock and ensure mem allocations are done with the
atomic flag.
Also, remove unnecessary locking after notifiers are unregistered.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Thu, 2 Nov 2017 08:31:29 +0000 (01:31 -0700)]
nfp: flower: app should use struct nfp_repr
Ensure priv netdev data in flower app is cast to nfp_repr and not nfp_net
as in other apps.
Fixes: 363fc53b8b58 ("nfp: flower: Convert ndo_setup_tc offloads to block callbacks")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Thu, 2 Nov 2017 08:09:45 +0000 (17:09 +0900)]
tools: bpf: handle long path in jit disasm
Use PATH_MAX instead of hardcoded array size 256
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vijaya Mohan Guvva [Wed, 1 Nov 2017 23:19:49 +0000 (16:19 -0700)]
liquidio: synchronize VF representor names with NIC firmware
LiquidIO firmware supports a vswitch that needs to know the names of the
VF representors in the host to maintain compatibility for direct
programming using external Openflow agents. So, for each VF representor,
send its name to the firmware when it gets registered and when its name
changes.
Signed-off-by: Vijaya Mohan Guvva <vijaya.guvva@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 08:01:39 +0000 (17:01 +0900)]
Merge branch 'BPF-range-marking-improvements-for-meta-data'
Daniel Borkmann says:
====================
BPF range marking improvements for meta data
The set contains improvements for direct packet access range
markings related to data_meta pointer and test cases for all
such access patterns that the verifier matches on.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 1 Nov 2017 22:58:11 +0000 (23:58 +0100)]
bpf: add test cases to bpf selftests to cover all meta tests
Lets also add test cases to cover all possible data_meta access tests
for good/bad access cases so we keep tracking them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 1 Nov 2017 22:58:10 +0000 (23:58 +0100)]
bpf: also improve pattern matches for meta access
Follow-up to
0fd4759c5515 ("bpf: fix pattern matches for direct
packet access") to cover also the remaining data_meta/data matches
in the verifier. The matches are also refactored a bit to simplify
handling of all the cases.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 1 Nov 2017 22:58:09 +0000 (23:58 +0100)]
bpf: minor cleanups after merge
Two minor cleanups after Dave's recent merge in
f8ddadc4db6c
("Merge git://git.kernel.org...") of net into net-next in
order to get the code in line with what was done originally
in the net tree: i) use max() instead of max_t() since both
ranges are u16, ii) don't split the direct access test cases
in the middle with bpf_exit test cases from
390ee7e29fc
("bpf: enforce return code for cgroup-bpf programs").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 1 Nov 2017 18:48:00 +0000 (11:48 -0700)]
security: bpf: replace include of linux/bpf.h with forward declarations
Touching linux/bpf.h makes us rebuild a surprisingly large
portion of the kernel. Remove the unnecessary dependency
from security.h, it only needs forward declarations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 1 Nov 2017 18:29:47 +0000 (11:29 -0700)]
net: systemport: Only inspect valid switch port & queues
Hesoteric board configurations where port 0 is not available would still
make SYSTEMPORT inspect the switch port 0, queue 0, which, not being
enabled, would cause transmit timeouts over time. Just ignore those
unconfigured rings instead.
Fixes: 84ff33eeb23d ("net: systemport: Establish DSA network device queue mapping")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 07:47:30 +0000 (16:47 +0900)]
Merge branch 'nfp-bpf-rename-ALU_OP_NEG-and-support-BPF_NEG'
Jakub Kicinski says:
====================
nfp: bpf: rename ALU_OP_NEG and support BPF_NEG
Jiong says:
Compilers are starting to use BPF_NEG, for example LLVM. However, NFP
does not support JITing it. This patch set adds this. Unit test is added
as well.
Meanwhile, the current NFP_ALU_NEG is actually doing bitwise NOT (one's
complement) operation, so the name is misleading. This patch set corrects
this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiong Wang [Wed, 1 Nov 2017 17:38:25 +0000 (10:38 -0700)]
nfp: bpf: support [BPF_ALU | BPF_ALU64] | BPF_NEG
This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently
starts to generate it.
NOTE: BPF_NEG takes single operand which is an register and serve as both
input and output.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiong Wang [Wed, 1 Nov 2017 17:38:24 +0000 (10:38 -0700)]
nfp: bpf: rename ALU_OP_NEG to ALU_OP_NOT
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is
actually performing "~B" operation which is bitwise NOT.
The using naming ALU_OP_NEG is misleading as NEG is -B which is not the
same as ~B.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 07:15:26 +0000 (16:15 +0900)]
Merge branch 'dpaa-cleanups'
yuan linyu says:
====================
net: dpaa: two minor cleanup
original i try to remove duplicate code which clean allocated per-cpu area,
thanks to David S. Miller, there are two build warning as errors.
path 1: fix old code maybe-uninitialized warning.
path 2: remove duplicate code and fix unused var warning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
yuan linyu [Wed, 1 Nov 2017 13:11:11 +0000 (21:11 +0800)]
net: dpaa: remove init which already done in per-cpu allocation
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
yuan linyu [Wed, 1 Nov 2017 13:10:32 +0000 (21:10 +0800)]
net: dpaa: fix maybe uninitialized var in dpaa_open()
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 1 Nov 2017 11:44:45 +0000 (12:44 +0100)]
bpf: cpumap micro-optimization in cpu_map_enqueue
Discovered that the compiler laid-out asm code in suboptimal way
when studying perf report during benchmarking of cpumap. Help
the compiler by the marking unlikely code paths.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 07:10:40 +0000 (16:10 +0900)]
Merge branch 'net-sched-block-callbacks-follow-up'
Jiri Pirko says:
====================
net: sched: block callbacks follow-up
This patchset does a bit of cleanup of leftovers after block callbacks
patchset. The main part is patch 2, which restores the original handling
of tc offload feature flag.
---
v1->v2:
- rebased on top of current net-next (bnxt changes)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 1 Nov 2017 10:47:41 +0000 (11:47 +0100)]
net: sched: remove ndo_setup_tc check from tc_can_offload
Since tc_can_offload is always called from block callback or egdev
callback, no need to check if ndo_setup_tc exists.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 1 Nov 2017 10:47:40 +0000 (11:47 +0100)]
net: sched: remove tc_can_offload check from egdev call
Since the only user, mlx5 driver does the check in
mlx5e_setup_tc_block_cb, no need to check here.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 1 Nov 2017 10:47:39 +0000 (11:47 +0100)]
net: sched: move the can_offload check from binding phase to rule insertion phase
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 1 Nov 2017 10:47:38 +0000 (11:47 +0100)]
net: sched: remove unused tc_should_offload helper
tc_should_offload is no longer used, remove it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Wed, 1 Nov 2017 10:18:13 +0000 (12:18 +0200)]
net: bridge: add notifications for the bridge dev on vlan change
Currently the bridge device doesn't generate any notifications upon vlan
modifications on itself because it doesn't use the generic bridge
notifications.
With the recent changes we know if anything was modified in the vlan config
thus we can generate a notification when necessary for the bridge device
so add support to br_ifinfo_notify() similar to how other combined
functions are done - if port is present it takes precedence, otherwise
notify about the bridge. I've explicitly marked the locations where the
notification should be always for the port by setting bridge to NULL.
I've also taken the liberty to rearrange each modified function's local
variables in reverse xmas tree as well.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 1 Nov 2017 10:17:15 +0000 (10:17 +0000)]
net: hns3: remove a couple of redundant assignments
The assignment to kinfo is redundant as this is a duplicate of
the initialiation of kinfo a few lines earlier, so it can be
removed. The assignment to v_tc_info is never read, so this
variable is redundant and can be removed completely. Cleans
up two clang warnings:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c:433:34:
warning: Value stored to 'kinfo' during its initialization is never read
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c:775:3:
warning: Value stored to 'v_tc_info' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 1 Nov 2017 09:09:13 +0000 (09:09 +0000)]
liquidio: remove redundant setting of inst_processed to zero
The zero value assigned to inst_processed at the end of each
iteration of the do-while loop is overwritten on the next iteration
and hence it is a redundant assignment and can be removed. Cleans
up clang warning:
drivers/net/ethernet/cavium/liquidio/request_manager.c:480:3:
warning: Value stored to 'inst_processed' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 1 Nov 2017 08:57:37 +0000 (08:57 +0000)]
net: dl2k: remove redundant re-assignment to np
The pointer np is initialized and then re-assigned the same value
a few lines later. Remove the redundant duplicated assignment. Cleans
up clang warning:
drivers/net/ethernet/dlink/dl2k.c:314:25: warning: Value stored to
'np' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 1 Nov 2017 08:49:45 +0000 (08:49 +0000)]
wan: wanxl: remove redundant assignment to stat
stat set to zero and the value is never read, instead stat is
set again in the do-loop. Hence the setting to zero is redundant
and can be removed. Cleans up clang warning:
drivers/net/wan/wanxl.c:737:2: warning: Value stored to 'stat'
is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 05:59:52 +0000 (14:59 +0900)]
Merge git://git./linux/kernel/git/davem/net
Smooth Cong Wang's bug fix into 'net-next'. Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 05:19:53 +0000 (14:19 +0900)]
Merge branch 'samples-pktgen-updates'
Jesper Dangaard Brouer says:
====================
Updates for samples/pktgen
This patchset updates samples/pktgen and synchronize with changes
maintained in https://github.com/netoptimizer/network-testing/
Features wise Robert Hoo <robert.hu@intel.com> added support for
detecting and determining dev NUMA node IRQs, and added a new script
named pktgen_sample06_numa_awared_queue_irq_affinity.sh that use these
features.
Cleanup remove last of the old sample files, as IPv6 is covered by
existing sample code.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 1 Nov 2017 10:41:24 +0000 (11:41 +0100)]
samples/pktgen: remove remaining old pktgen sample scripts
Since commit
0f06a6787e05 ("samples: Add an IPv6 '-6' option to the
pktgen scripts") the newer pktgen_sampleXX script does show howto use
IPv6 with pktgen.
Thus, there is no longer a reason to keep the older sample scripts around.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Wed, 1 Nov 2017 10:41:19 +0000 (11:41 +0100)]
samples/pktgen: update sample03, no need for clones when bursting
Like sample05, don't use pktgen clone_skb feature when using 'burst' feature,
it is not really needed. This brings the burst users in sync.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hoo [Wed, 1 Nov 2017 10:41:14 +0000 (11:41 +0100)]
samples/pktgen: add script pktgen_sample06_numa_awared_queue_irq_affinity.sh
This script simply does:
* Detect $DEV's NUMA node belonging.
* Bind each thread (processor of NUMA locality) with each $DEV queue's
irq affinity, 1:1 mapping.
* How many '-t' threads input determines how many queues will be utilized.
If '-f' designates first cpu id, then offset in the NUMA node's cpu list.
(Changes by Jesper: allow changing count from cmdline via '-n')
Signed-off-by: Robert Hoo <robert.hu@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hoo [Wed, 1 Nov 2017 10:41:09 +0000 (11:41 +0100)]
samples/pktgen: Add some helper functions
1. given a device, get its NUMA belongings
2. given a device, get its queues' irq numbers.
3. given a NUMA node, get its cpu id list.
Signed-off-by: Robert Hoo <robert.hu@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Nov 2017 05:17:12 +0000 (14:17 +0900)]
Merge branch 'enic-Additional-ethtool-support'
Parvi Kaustubhi says:
====================
enic: Additional ethtool support
This patch set allows the user to show or modify rx/tx ring sizes using
ethtool.
v2:
- remove unused variable to fix build warning.
- update list of maintainers for cisco vic ethernet nic driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parvi Kaustubhi [Wed, 1 Nov 2017 15:44:48 +0000 (08:44 -0700)]
MAINTAINERS: update MAINTAINERS for cisco vic
Add myself to list of maintainers for cisco vic ethernet nic driver. Remove
Neel as he left.
Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parvi Kaustubhi [Wed, 1 Nov 2017 15:44:47 +0000 (08:44 -0700)]
enic: Add support for 'ethtool -g/-G'
Add support for displaying and modifying rx and tx ring sizes using
ethtool.
Also, increasing version to 2.3.0.45
Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parvi Kaustubhi [Wed, 1 Nov 2017 15:44:46 +0000 (08:44 -0700)]
enic: reset fetch index
Since we are allowing rx ring size modification, reset fetch index
everytime. Otherwise it could have a stale value that can lead to a null
pointer dereference.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 1 Nov 2017 23:04:27 +0000 (16:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull signal bugfix from Eric Biederman:
"When making the generic support for SIGEMT conditional on the presence
of SIGEMT I made a typo that causes it to fail to activate. It was
noticed comparatively quickly but the bug report just made it to me
today"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Fix name of SIGEMT in #if defined() check
Andrew Clayton [Wed, 1 Nov 2017 15:49:59 +0000 (15:49 +0000)]
signal: Fix name of SIGEMT in #if defined() check
Commit
cc731525f26a ("signal: Remove kernel interal si_code magic")
added a check for SIGMET and NSIGEMT being defined. That SIGMET should
in fact be SIGEMT, with SIGEMT being defined in
arch/{alpha,mips,sparc}/include/uapi/asm/signal.h
This was actually pointed out by BenHutchings in a lwn.net comment
here https://lwn.net/Comments/734608/
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Signed-off-by: Andrew Clayton <andrew@digital-domain.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Linus Torvalds [Wed, 1 Nov 2017 21:46:38 +0000 (14:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes that should go into this series:
- Regression fix for ide-cd, ensuring that a request is fully
initialized. From Hongxu.
- Ditto fix for virtio_blk, from Bart.
- NVMe fix from Keith, ensuring that we set the right block size on
revalidation. If the block size changed, we'd be in trouble without
it.
- NVMe rdma fix from Sagi, fixing a potential hang while the
controller is being removed"
* 'for-linus' of git://git.kernel.dk/linux-block:
ide:ide-cd: fix kernel panic resulting from missing scsi_req_init
nvme: Fix setting logical block format when revalidating
virtio_blk: Fix an SG_IO regression
nvme-rdma: fix possible hang when issuing commands during ctrl removal
Linus Torvalds [Wed, 1 Nov 2017 15:29:01 +0000 (08:29 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix refcounting in xfrm_bundle_lookup() when using a dummy bundle,
from Steffen Klassert.
2) Fix crypto header handling in rx data frames in ath10k driver, from
Vasanthakumar Thiagarajan.
3) Fix use after free of qdisc when we defer tcp_chain_flush() to a
workqueue. From Cong Wang.
4) Fix double free in lapbether driver, from Pan Bian.
5) Sanitize TUNSETSNDBUF values, from Craig Gallek.
6) Fix refcounting when addrconf_permanent_addr() calls
ipv6_del_addr(). From Eric Dumazet.
7) Fix MTU probing bug in TCP that goes back to 2007, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
tcp: fix tcp_mtu_probe() vs highest_sack
ipv6: addrconf: increment ifp refcount before ipv6_del_addr()
tun/tap: sanitize TUNSETSNDBUF input
mlxsw: i2c: Fix buffer increment counter for write transaction
mlxsw: reg: Add high and low temperature thresholds
MAINTAINERS: Remove Yotam from mlxfw
MAINTAINERS: Update Yotam's E-mail
net: hns: set correct return value
net: lapbether: fix double free
bpf: remove SK_REDIRECT from UAPI
net: phy: marvell: Only configure RGMII delays when using RGMII
xfrm: Fix GSO for IPsec with GRE tunnel.
tc-testing: fix arg to ip command: -s -> -n
net_sched: remove tcf_block_put_deferred()
l2tp: hold tunnel in pppol2tp_connect()
Revert "ath10k: fix napi_poll budget overflow"
ath10k: rebuild crypto header in rx data frames
wcn36xx: Remove unnecessary rcu_read_unlock in wcn36xx_bss_info_changed
xfrm: Clear sk_dst_cache when applying per-socket policy.
xfrm: Fix xfrm_dst_cache memleak
Vlastimil Babka [Wed, 1 Nov 2017 07:21:25 +0000 (08:21 +0100)]
x86/mm: fix use-after-free of vma during userfaultfd fault
Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
__do_page_fault() with the following reproducer:
mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
r0 = userfaultfd(0x0)
ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
r1 = gettid()
syz_open_dev$evdev(&(0x7f0000013000-0x12)="
2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
tkill(r1, 0x7)
The vma should be pinned by mmap_sem, but handle_userfault() might (in a
return to userspace scenario) release it and then acquire again, so when
we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
the vma might be gone.
Specifically, per Andrea the scenario is
"A return to userland to repeat the page fault later with a
VM_FAULT_NOPAGE retval (potentially after handling any pending signal
during the return to userland). The return to userland is identified
whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
vmf->flags"
However, since commit
a3c4fb7c9c2e ("x86/mm: Fix fault error path using
unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
that point, which can thus become use-after-free. Fix this by moving
the read before calling handle_mm_fault().
Reported-by: syzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 1 Nov 2017 14:59:39 +0000 (07:59 -0700)]
Merge tag 'smb3-file-name-too-long-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
"smb3 file name too long fix"
* tag 'smb3-file-name-too-long-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: check MaxPathNameComponentLength != 0 before using it
Hongxu Jia [Tue, 31 Oct 2017 07:39:40 +0000 (15:39 +0800)]
ide:ide-cd: fix kernel panic resulting from missing scsi_req_init
Since we split the scsi_request out of struct request, while the
standard prep_rq_fn builds 10 byte cmds, it missed to invoke
scsi_req_init() to initialize certain fields of a scsi_request
structure (.__cmd[], .cmd, .cmd_len and .sense_len but no other
members of struct scsi_request).
An example panic on virtual machines (qemu/virtualbox) to boot
from IDE cdrom:
...
[ 8.754381] Call Trace:
[ 8.755419] blk_peek_request+0x182/0x2e0
[ 8.755863] blk_fetch_request+0x1c/0x40
[ 8.756148] ? ktime_get+0x40/0xa0
[ 8.756385] do_ide_request+0x37d/0x660
[ 8.756704] ? cfq_group_service_tree_add+0x98/0xc0
[ 8.757011] ? cfq_service_tree_add+0x1e5/0x2c0
[ 8.757313] ? ktime_get+0x40/0xa0
[ 8.757544] __blk_run_queue+0x3d/0x60
[ 8.757837] queue_unplugged+0x2f/0xc0
[ 8.758088] blk_flush_plug_list+0x1f4/0x240
[ 8.758362] blk_finish_plug+0x2c/0x40
...
[ 8.770906] RIP: ide_cdrom_prep_fn+0x63/0x180 RSP:
ffff92aec018bae8
[ 8.772329] ---[ end trace
6408481e551a85c9 ]---
...
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David S. Miller [Wed, 1 Nov 2017 13:08:48 +0000 (22:08 +0900)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2017-11-01
Here's one more bluetooth-next pull request for the 4.15 kernel.
- New NFA344A device entry for btusb drvier
- Fix race conditions in hci_ldisc
- Fix for isochronous interface assignments in btusb driver
- A few other smaller fixes & improvements
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Wed, 1 Nov 2017 07:08:04 +0000 (00:08 -0700)]
bpf: fix verifier memory leaks
fix verifier memory leaks
Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption")
Signed-off-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Nov 2017 13:06:04 +0000 (22:06 +0900)]
Merge branch 'cxgb4-add-hash-filter-support-to-tc-flower-offload'
Rahul Lakkireddy says:
====================
cxgb4: add hash-filter support to tc-flower offload
This series of patches add support to create hash-filters; a.k.a
exact-match filters, to tc-flower offload. T6 supports creating
~500K hash-filters in hw and can theoretically be expanded up to
~1 million.
Patch 1 fetches and saves the configured hw filter tuple field shifts
and filter mask.
Patch 2 initializes the driver to use hash-filter configuration.
Patch 3 adds support to create hash filters in hw.
Patch 4 adds support to delete hash filters in hw.
Patch 5 adds support to retrieve filter stats for hash filters.
Patch 6 converts the flower table to use rhashtable instead of
static hlist.
Patch 7 finally adds support to create hash filters via tc-flower
offload.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:05 +0000 (08:53 +0530)]
cxgb4: add support to create hash-filters via tc-flower offload
Determine whether the flow classifies as exact-match with respect to
4-tuple and configured tuple mask in hw. If successfully classified
as exact-match, offload the flow as hash-filter in hw.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:04 +0000 (08:53 +0530)]
cxgb4: convert flower table to use rhashtable
T6 supports ~500K hash filters and can theoretically climb up to
~1 million hash filters. Preallocated hash table is not efficient
in terms of memory usage. So, use rhashtable instead which gives
the flexibility to grow based on usage.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:03 +0000 (08:53 +0530)]
cxgb4: add support to retrieve stats for hash filters
Add support to retrieve packet-count and byte-count for hash-filters
by retrieving filter-entry appropriately based on whether the
request is for hash-filter or not.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:02 +0000 (08:53 +0530)]
cxgb4: add support to delete hash filter
Use a combined ulptx work-request to send hash filter deletion
request to hw. Hash filter deletion reply is processed on
getting cpl_abort_rpl_rss.
Release any L2T/SMT/CLIP entries on filter deletion.
Also, free up the corresponding filter entry.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:01 +0000 (08:53 +0530)]
cxgb4: add support to create hash filters
Add support to create hash (exact-match) filters based on the value
of 'hash' field in ch_filter_specification.
Allocate SMT/L2T entries if DMAC-rewrite/SMAC-rewrite is requested.
Allocate CLIP entry in case of IPv6 filter.
Use cpl_act_open_req[6] to send hash filter create request to hw.
Also, the filter tuple is calculated as part of sending this request.
Hash-filter reply is processed on getting cpl_act_open_rpl.
In case of success, various bits/fields in filter-tcb are set per
filter requirement, such as enabling filter hitcnts, and/or various
header rewrite operations, such as VLAN-rewrite, NAT or
(L3/L4)-rewrite, and SMAC/DMAC-rewrite. In case of failure, clear the
filter entry and release any hw resources occupied by it.
The patch also moves the functions set_tcb_field, set_tcb_tflag and
configure_filter_smac towards beginning of file.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:23:00 +0000 (08:53 +0530)]
cxgb4: initialize hash-filter configuration
Add support for hash-filter configuration on T6. Also, do basic
checks for the related initialization.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Wed, 1 Nov 2017 03:22:59 +0000 (08:52 +0530)]
cxgb4: save additional filter tuple field shifts in tp_params
Save additional filter tuple field shifts in tp_params based on
configured filter tuple fields.
Also, save the combined filter tuple mask based on configured
filter tuple fields.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Nov 2017 12:30:24 +0000 (21:30 +0900)]
Merge branch 'lan9303-Fix-STP-and-flooding-issues'
Egil Hjelmeland says:
====================
net: dsa: lan9303: Fix STP and flooding issues
This patch set finishes the STP support, and fixes flooding issues.
Patch 1 fixes a flooding issue in the previous patch set.
Patch 2 finishes STP support by adding a ALR entry.
Patch 3 prevent duplicate flooding in HW and SW bridge.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Tue, 31 Oct 2017 14:48:02 +0000 (15:48 +0100)]
net: dsa: lan9303: lan9303_rcv set skb->offload_fwd_mark
The chip flood broadcast and unknown multicast frames.
On receive set skb->offload_fwd_mark to prevent the SW from flooding to the
same ports.
One exception: Because the ALR is set up to forward STP BPDUs only to CPU,
the SW bridge should flood STP BPDUs if local STP is not enabled.
This is archived by not setting skb->offload_fwd_mark on STP BPDUs.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Tue, 31 Oct 2017 14:48:01 +0000 (15:48 +0100)]
net: dsa: lan9303: Add STP ALR entry on port 0
STP BPDUs arriving on user ports must sent to CPU port only,
for processing by the SW bridge.
Add an ALR entry with STP state override to fix that.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Tue, 31 Oct 2017 14:48:00 +0000 (15:48 +0100)]
net: dsa: lan9303: Transmit using ALR when unicast
lan9303_xmit_use_arl() introduced in previous patch set is wrong.
The chip flood broadcast and unknown multicast frames. The effect is that
broadcasts and multicasts are duplicated on egress. It is not possible to
configure the chip to direct unknown multicasts to CPU port only.
This means that only unicast frames can be transmitted using ALR lookup.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 31 Oct 2017 14:37:55 +0000 (14:37 +0000)]
net: thunderx: remove a couple of redundant assignments
The assignment to pointer msg is redundant as it is never read, so
remove msg. Also remove the first assignment to qset as this is not
read before the next re-assignment of a new value to qset in the
for-loop. Cleans up two clang warnings:
drivers/net/ethernet/cavium/thunder/nic_main.c:589:2: warning: Value
stored to 'msg' is never read
drivers/net/ethernet/cavium/thunder/nic_main.c:611:2: warning: Value
stored to 'qset' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Tue, 31 Oct 2017 14:37:18 +0000 (16:37 +0200)]
MAINTAINERS: Add lib/net_utils.c to NETWORKING (general)
It looks like the best place in MAINTAINERS data base to cover this
orphaned module.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 31 Oct 2017 14:29:47 +0000 (14:29 +0000)]
sfc: support rx-fcs and rx-all
Ethernet FCS inclusion (rx-fcs) is supported on EF10 NICs, conditional on
a firmware capability bit (MC_CMD_GET_CAPABILITIES_OUT_RX_INCLUDE_FCS).
To receive frames with bad FCS (rx-all) we just don't return the discard
flag EFX_RX_PKT_DISCARD from efx_ef10_handle_rx_event_errors() or
efx_farch_handle_rx_not_ok().
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 31 Oct 2017 14:23:24 +0000 (14:23 +0000)]
net: macb: remove redundant assignment to variable work_done
Variable work_done is set to zero and this value is never read, instead
it is set to another value a few statements later. Remove the redundant
assignment. Cleans up clang warning:
drivers/net/ethernet/cadence/macb_main.c:1221:2: warning: Value stored
to 'work_done' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 31 Oct 2017 13:32:38 +0000 (14:32 +0100)]
ipv4: fix validate_source for VRF setup
David reported breakages of VRF scenarios due to the
commit
6e617de84e87 ("net: avoid a full fib lookup when rp_filter is
disabled."): the local addresses based test is too strict when VRFs
are in place.
With this change we fall-back to a full lookup when custom fib rules
are in place; so that we address the VRF use case and possibly other
similar issues in non trivial setups.
v1 -> v2:
- fix build breakage when CONFIG_IP_MULTIPLE_TABLES is not defined,
reported by the kbuild test robot
Reported-by: David Ahern <dsahern@gmail.com>
Fixes: 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 31 Oct 2017 13:28:16 +0000 (13:28 +0000)]
sctp: fix error return code in sctp_send_add_streams()
Fix to returnerror code -ENOMEM from the sctp_make_strreset_addstrm()
error handling case instead of 0. 'retval' can be overwritten to 0 after
call sctp_stream_alloc_out().
Fixes: e090abd0d81c ("sctp: factor out stream->out allocation")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 31 Oct 2017 12:01:47 +0000 (12:01 +0000)]
net: hso: remove redundant unused variable dev
The pointer dev is being assigned but is never used, hence it is
redundant and can be removed. Cleans up clang warning:
drivers/net/usb/hso.c:2280:2: warning: Value stored to 'dev' is
never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Tue, 31 Oct 2017 10:25:37 +0000 (18:25 +0800)]
ppp: Destroy the mutex when cleanup
The mutex_destroy only makes sense when enable DEBUG_MUTEX. For the
good readbility, it's better to invoke it in exit func when the init
func invokes mutex_init.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 31 Oct 2017 10:08:23 +0000 (10:08 +0000)]
net: ethernet: slicoss: remove redundant initialization of idx
Variable idx is being initialized and later on over-written by
a new value in a do-loop without the initial value ever being
read. Hence the initializion is redundant and can be removed.
Cleans up clang warning:
drivers/net/ethernet/alacritech/slicoss.c:358:15: warning: Value
stored to 'idx' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 31 Oct 2017 06:08:20 +0000 (23:08 -0700)]
tcp: fix tcp_mtu_probe() vs highest_sack
Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.
If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.
Bad things would then happen, including infinite loops.
This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.
Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.
Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 31 Oct 2017 05:47:09 +0000 (22:47 -0700)]
ipv6: addrconf: increment ifp refcount before ipv6_del_addr()
In the (unlikely) event fixup_permanent_addr() returns a failure,
addrconf_permanent_addr() calls ipv6_del_addr() without the
mandatory call to in6_ifa_hold(), leading to a refcount error,
spotted by syzkaller :
WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
lib/refcount.c:227
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-
20171009+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x41c kernel/panic.c:181
__warn+0x1c4/0x1e0 kernel/panic.c:544
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
RSP: 0018:
ffff8801ca49e680 EFLAGS:
00010286
RAX:
000000000000002c RBX:
ffff8801d07cfcdc RCX:
0000000000000000
RDX:
000000000000002c RSI:
1ffff10039493c90 RDI:
ffffed0039493cc4
RBP:
ffff8801ca49e688 R08:
ffff8801ca49dd70 R09:
0000000000000000
R10:
ffff8801ca49df58 R11:
0000000000000000 R12:
1ffff10039493cd9
R13:
ffff8801ca49e6e8 R14:
ffff8801ca49e7e8 R15:
ffff8801d07cfcdc
__in6_ifa_put include/net/addrconf.h:369 [inline]
ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394 [inline]
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
call_netdevice_notifiers net/core/dev.c:1715 [inline]
__dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
dev_change_flags+0xf5/0x140 net/core/dev.c:6879
do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
__sys_sendmsg+0xe5/0x210 net/socket.c:2083
SYSC_sendmsg net/socket.c:2094 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2090
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7fa9174d3320
RSP: 002b:
00007ffe302ae9e8 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
00007ffe302b2ae0 RCX:
00007fa9174d3320
RDX:
0000000000000000 RSI:
00007ffe302aea20 RDI:
0000000000000016
RBP:
0000000000000082 R08:
0000000000000000 R09:
000000000000000f
R10:
0000000000000000 R11:
0000000000000246 R12:
00007ffe302b32a0
R13:
0000000000000000 R14:
00007ffe302b2ab8 R15:
00007ffe302b32b8
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Nov 2017 12:15:09 +0000 (21:15 +0900)]
Merge branch 'PHYLINK-cosmetic-and-build-fixes'
Florian Fainelli says:
====================
PHYLINK cosmetic and build fixes
Please find two small "fixes" one that corrects some stylistic changes and
another one that fixes an actual build failure in sfp.c. Since PHYLINK is
not directly visible to user, and there are no in-tree users yet (coming)
this is not targeted at "net" but "net-next" instead.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>