Eric Dumazet [Sat, 4 Feb 2017 15:49:21 +0000 (07:49 -0800)]
virtio_net: exploit napi_complete_done() return value
Since commit
364b6055738b ("net: busy-poll: return busypolling status to
drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.
This patch changes virtio_net to use this boolean to avoid a bit of
overhead for busy-poll users.
Jason reports about 1.1% improvement for 1 byte TCP_RR (burst 100).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Feb 2017 17:13:27 +0000 (12:13 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-02-03
This series contains updates to i40e/i40evf only.
Jake fixes up the driver to not call i40e_vsi_kill_vlan() or
i40e_vsi_add_vlan() when the PVID is set or when the VID is less than 1.
Cleaned up a check which really is not needed since there is no real
reason why we cannot just call i40e_del_mac_all_vlan() directly. Renamed
functions to better reflect their actual purpose and how they function
in a more clear manner.
Bimmy cleans up unused/deprecated macros.
Mitch cleans up unused device ids which were intended for use when
running Linux VF drivers under Hyper-V, but found to be not needed.
Then cleaned up a function that is no longer needed since the client
open and close functions were refactored. Adds a sleep without timeout
until the reply from the PF driver has been received since the iWARP
client cannot continue until the operation has been completed.
Tushar Dave fixes an issue seen on SPARC where the use of the 'packed'
directive was causing kernel unaligned errors.
Alex does a refactor to pull some data off of the stack and store it
in the transmit buffer info section of the transmit ring.
Alan fixes a bug which was caused by passing a bad register value to the
firmware, by refactoring the macro INTRL_USEC_TO_REG into a static
inline function. Also added feedback to the user as to the actual
interrupt rate limit being used when it differs from the requested limit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 22:29:42 +0000 (14:29 -0800)]
net: skb_needs_check() accepts CHECKSUM_NONE for tx
My recent change missed fact that UFO would perform a complete
UDP checksum before segmenting in frags.
In this case skb->ip_summed is set to CHECKSUM_NONE.
We need to add this valid case to skb_needs_check()
Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 02:43:28 +0000 (18:43 -0800)]
net: remove support for per driver ndo_busy_poll()
We added generic support for busy polling in NAPI layer in linux-4.5
No network driver uses ndo_busy_poll() anymore, we can get rid
of the pointer in struct net_device_ops, and its use in sk_busy_loop()
Saves NETIF_F_BUSY_POLL features bit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 22:28:21 +0000 (17:28 -0500)]
enic: Remove local ndo_busy_poll() implementation.
We do polling generically these days.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 00:59:18 +0000 (16:59 -0800)]
ixgbevf: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 00:26:39 +0000 (16:26 -0800)]
ixgbe: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 21:58:20 +0000 (16:58 -0500)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
sk_buff so we only access one single cacheline in the conntrack
hotpath. Patchset from Florian Westphal.
2) Don't leak pointer to internal structures when exporting x_tables
ruleset back to userspace, from Willem DeBruijn. This includes new
helper functions to copy data to userspace such as xt_data_to_user()
as well as conversions of our ip_tables, ip6_tables and arp_tables
clients to use it. Not surprinsingly, ebtables requires an ad-hoc
update. There is also a new field in x_tables extensions to indicate
the amount of bytes that we copy to userspace.
3) Add nf_log_all_netns sysctl: This new knob allows you to enable
logging via nf_log infrastructure for all existing netnamespaces.
Given the effort to provide pernet syslog has been discontinued,
let's provide a way to restore logging using netfilter kernel logging
facilities in trusted environments. Patch from Michal Kubecek.
4) Validate SCTP checksum from conntrack helper, from Davide Caratti.
5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
a copy&paste from the original helper, from Florian Westphal.
6) Reset netfilter state when duplicating packets, also from Florian.
7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
nft_meta, from Liping Zhang.
8) Add missing code to deal with loopback packets from nft_meta when
used by the netdev family, also from Liping.
9) Several cleanups on nf_tables, one to remove unnecessary check from
the netlink control plane path to add table, set and stateful objects
and code consolidation when unregister chain hooks, from Gao Feng.
10) Fix harmless reference counter underflow in IPVS that, however,
results in problems with the introduction of the new refcount_t
type, from David Windsor.
11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
from Davide Caratti.
12) Missing documentation on nf_tables uapi header, from Liping Zhang.
13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 21:35:44 +0000 (16:35 -0500)]
Merge branch 'mlxsw-Introduce-TC-Flower-offload-using-TCAM'
Jiri Pirko says:
====================
mlxsw: Introduce TC Flower offload using TCAM
This patchset introduces support for offloading TC cls_flower and actions
to Spectrum TCAM-base policy engine.
The patchset contains patches to allow work with flexible keys and actions
which are used in Spectrum TCAM.
It also contains in-driver infrastructure for offloading TC rules to TCAM HW.
The TCAM management code is simple and limited for now. It is going to be
extended as a follow-up work.
The last patch uses the previously introduced infra to allow to implement
cls_flower offloading. Initially, only limited set of match-keys and only
a drop and forward actions are supported.
As a dependency, this patchset introduces parman - priority array
area manager - as a library.
v1->v2:
- patch11:
- use __set_bit and __test_and_clear_bit as suggested by DaveM
- patch16:
- Added documentation to the API functions as suggested by Tom Herbert
- patch17:
- use __set_bit and __clear_bit as suggested by DaveM
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:09 +0000 (10:29 +0100)]
mlxsw: spectrum: Implement TC flower offload
Extend the existing setup_tc ndo call and allow to offload cls_flower
rules. Only limited set of dissector keys and actions are supported now.
Use previously introduced ACL infrastructure to offload cls_flower rules
to be processed in the HW.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:08 +0000 (10:29 +0100)]
sched: cls_flower: expose priority to offloading netdevice
The driver that offloads flower rules needs to know with which priority
user inserted the rules. So add this information into offload struct.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:07 +0000 (10:29 +0100)]
mlxsw: spectrum: Introduce ACL core with simple TCAM implementation
Add ACL core infrastructure for Spectrum ASIC. This infra provides an
abstraction layer over specific HW implementations. There are two basic
objects used. One is "rule" and the second is "ruleset" which serves as a
container of multiple rules. In general, within one ruleset the rules are
allowed to have multiple priorities and masks. Each ruleset is bound to
either ingress or egress a of port netdevice.
The initial TCAM implementation is very simple and limited. It utilizes
parman lsort manager to take care of TCAM region layout.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:06 +0000 (10:29 +0100)]
lib: Introduce priority array area manager
This introduces a infrastructure for management of linear priority
areas. Priority order in an array matters, however order of items inside
a priority group does not matter.
As an initial implementation, L-sort algorithm is used. It is quite
trivial. More advanced algorithm called P-sort will be introduced as a
follow-up. The infrastructure is prepared for other algos.
Alongside this, a testing module is introduced as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:05 +0000 (10:29 +0100)]
list: introduce list_for_each_entry_from_reverse helper
Similar to list_for_each_entry_continue and its reverse variant
list_for_each_entry_continue_reverse, introduce reverse helper for
list_for_each_entry_from.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:04 +0000 (10:29 +0100)]
mlxsw: resources: Add ACL related resources
Add couple of resource limits related to ACL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:03 +0000 (10:29 +0100)]
mlxsw: spectrum: Introduce basic set of flexible key blocks
Introduce basic set of Spectrum flexible key blocks. It contains blocks
needed to carry all elements defined so far.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:02 +0000 (10:29 +0100)]
mlxsw: core: Introduce flexible actions support
Each entry which is matched during ACL lookup points to an action set.
This action set contains up to three separate actions. If more actions
are needed to be chained, the extended set is created to hold them
in KVD linear area.
This patch implements handling of sets and encoding of actions.
Currectly, only two actions are supported. Drop and forward. Forward
action uses PBS pointer to KVD linear area, so the action code needs to
take care of this as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:01 +0000 (10:29 +0100)]
mlxsw: core: Introduce flexible keys support
Hardware supports matching on so called "flexible keys". The idea is to
assemble an optimal key to use for matching according to the fields in
packet (elements) requested by user. Certain sets of elements are
combined into pre-defined blocks. There is a picker to find needed blocks.
Keys consist of 1..n blocks.
Alongside with that, an initial portion of elements is introduced in order
to be able to offload basic cls_flower rules.
Picked keys are cached so multiple rules could share them.
There is an encode function provided that takes care of encoding key and
mask values according to given key.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:29:00 +0000 (10:29 +0100)]
mlxsw: reg: Add Policy-Engine Extended Flexible Action Register
PEFA register is used for accessing an extended flexible action entry
in the central KVD Linear Database.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:59 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine Policy Based Switching Register
The PPBS register retrieves and sets Policy Based Switching Table entries.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:58 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine Rules Copy Register
The PRCR register is used for accessing rules within a TCAM region.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:57 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine Port Binding Table
The PPBT is used for configuration of the Port Binding Table.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:56 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine TCAM Entry Register Version 2
The PTCE-V2 register is used for accessing rules within a TCAM region.
It is a new version of PTCE in order to support wider key, mask and
action within a TCAM region.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:55 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine TCAM Allocation Register
The PTAR register is used for allocation of regions in the TCAM.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:54 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine ACL Group Table register
The PAGT register is used for configuration of the ACL Group Table.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:53 +0000 (10:28 +0100)]
mlxsw: reg: Add Policy-Engine ACL Register
The PACL register is used for configuration of the ACL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:52 +0000 (10:28 +0100)]
mlxsw: item: Add helpers for getting pointer into payload for char buffer item
Sometimes it is handy to get a pointer to a char buffer item and use it
direcly to write/read data. So add these helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 3 Feb 2017 09:28:51 +0000 (10:28 +0100)]
mlxsw: item: Add 8bit item helpers
Item heplers for 8bit values are needed, let's add them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Fri, 3 Feb 2017 04:46:21 +0000 (23:46 -0500)]
bonding: Remove unnecessary returned value check
The function bond_info_query alwarys returns 0. As such, in the function
bond_do_ioctl, it is not necessary to check the returned value. So the
interface type of the function bond_info_query is changed to void. The
redundant check is removed.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 04:40:08 +0000 (20:40 -0800)]
tcp: clear pfmemalloc on outgoing skb
Josef Bacik diagnosed following problem :
I was seeing random disconnects while testing NBD over loopback.
This turned out to be because NBD sets pfmemalloc on it's socket,
however the receiving side is a user space application so does not
have pfmemalloc set on its socket. This means that
sk_filter_trim_cap will simply drop this packet, under the
assumption that the other side will simply retransmit. Well we do
retransmit, and then the packet is just dropped again for the same
reason.
It seems the better way to address this problem is to clear pfmemalloc
in the TCP transmit path. pfmemalloc strict control really makes sense
on the receive path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 19:44:27 +0000 (11:44 -0800)]
cxgb4: get rid of custom busy poll code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot of code, we also remove one spin_lock()
from driver fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 18:50:48 +0000 (10:50 -0800)]
myri10ge: get rid of custom busy poll code
Compared to custom busy_poll, the generic NAPI one is simpler and
removes a lot of code. It removes one atomic in the fast path (when
busy poll is not in action) since we do not have to use an extra
spinlock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 18:16:00 +0000 (10:16 -0800)]
be2net: get rid of custom busy poll code
Compared to custom busy_poll, the generic NAPI one is better, since
it allows to use GRO, and it removes a lot of code and extra locked
operations in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 2 Feb 2017 16:52:21 +0000 (08:52 -0800)]
net: ipv6: Set protocol to kernel for local routes
IPv6 stack does not set the protocol for local routes, so those routes show
up with proto "none":
$ ip -6 ro ls table local
local ::1 dev lo proto none metric 0 pref medium
local 2100:3:: dev lo proto none metric 0 pref medium
local 2100:3::4 dev lo proto none metric 0 pref medium
local fe80:: dev lo proto none metric 0 pref medium
...
Set rt6i_protocol to RTPROT_KERNEL for consistency with IPv4. Now routes
show up with proto "kernel":
$ ip -6 ro ls table local
local ::1 dev lo proto kernel metric 0 pref medium
local 2100:3:: dev lo proto kernel metric 0 pref medium
local 2100:3::4 dev lo proto kernel metric 0 pref medium
local fe80:: dev lo proto kernel metric 0 pref medium
...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 2 Feb 2017 16:09:54 +0000 (17:09 +0100)]
trace: rename trace_print_hex_seq arg and add kdoc
Steven suggested to improve trace_print_hex_seq() a bit after commit
2acae0d5b0f7 ("trace: add variant without spacing in trace_print_hex_seq")
in two ways: i) by adding a kdoc comment for the helper function
itself and ii) by renaming 'spacing' argument into 'concatenate'
to better denote that we don't add spaces between each hex bytes.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 2 Feb 2017 16:05:04 +0000 (17:05 +0100)]
MAINTAINERS: add Ivan as a switchdev maintainer
Ivan will be taking care of switchdev code from now on.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 20:21:23 +0000 (15:21 -0500)]
Merge branch 'bridge-per-vlan-dst_metadata-support'
Roopa Prabhu says:
====================
bridge: per vlan dst_metadata support
High level summary:
lwt and dst_metadata have enabled vxlan l3 deployments
to use a single vxlan netdev for multiple vnis eliminating the scalability
problem with using a single vxlan netdev per vni. This series tries to
do the same for vxlan netdevs in pure l2 bridged networks.
Use-case/deployment and details are below.
Deployment scerario details:
As we know VXLAN is used to build layer 2 virtual networks across the
underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
or a vswitch in the hypervisor. This patch series mainly
focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
along with vlan id is used to identify layer 2 segments in a vxlan
overlay network. Vxlan bridging is the function provided by Vteps to terminate
vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
the original Layer 2 packet if there is one before encapsulating the packet
into the VXLAN format to transmit it through the underlay network. The remote
VTEP devices have information about the VLAN in which the packet will be
placed based on their own VLAN-to-VXLAN VNI mapping configurations.
Existing solution:
Without this patch series one can deploy such a vtep configuration by
adding the local ports and vxlan netdevs into a vlan filtering bridge.
The local ports are configured as trunk ports carrying all vlans.
A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
The vxlan netdev only receives traffic corresponding to the vlan it is mapped
to. This configuration maps traffic belonging to a vlan to the corresponding
vxlan segment.
-----------------------------------
| bridge |
| |
-----------------------------------
|100,200 |100 (pvid) |200 (pvid)
| | |
swp1 vxlan1000 vxlan2000
This provides the required vxlan bridging function but poses a
scalability problem with using a separate vxlan netdev for each vni.
Solution in this patch series:
The Goal is to use a single vxlan device to carry all vnis similar
to the vxlan collect metadata mode but additionally allowing the bridge
and vxlan driver to carry all the forwarding information and also learn.
This implementation uses the existing dst_metadata infrastructure to map
vlan to a tunnel id.
- vxlan driver changes:
- enable collect metadata mode to be used with learning,
replication and fdb
- A single fdb table hashed by (mac, vni)
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata and relies
on learnt or static forwarding information table to forward the packet
- Bridge driver changes: per vlan dst_metadata support:
- Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
kept the api generic for any tunnel info
- Uapi to configure/unconfigure/dump per vlan tunnel data
- new bridge port flag to turn this feature on/off. off by default
- ingress hook:
- if port is a tunnel port, use tunnel info in
attached dst_metadata to map it to a local vlan
- egress hook:
- if port is a tunnel port, use tunnel info attached to vlan
to set dst_metadata on the skb
Other approaches tried and vetoed:
- tc vlan push/pop and tunnel metadata dst:
- though tc can be used to do part of this, these patches address a deployment
case where bridge driver vlan filtering and forwarding information
database along with vxlan driver forwarding information table and learning
are required.
- making vxlan driver understand vlan-vni mapping:
- I had a series almost ready with this one but soon realized
it duplicated a lot of vlan handling code in the vxlan driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 1 Feb 2017 06:59:55 +0000 (22:59 -0800)]
bridge: vlan dst_metadata hooks in ingress and egress paths
- ingress hook:
- if port is a tunnel port, use tunnel info in
attached dst_metadata to map it to a local vlan
- egress hook:
- if port is a tunnel port, use tunnel info attached to
vlan to set dst_metadata on the skb
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 1 Feb 2017 06:59:54 +0000 (22:59 -0800)]
bridge: per vlan dst_metadata netlink support
This patch adds support to attach per vlan tunnel info dst
metadata. This enables bridge driver to map vlan to tunnel_info
at ingress and egress. It uses the kernel dst_metadata infrastructure.
The initial use case is vlan to vni bridging, but the api is generic
to extend to any tunnel_info in the future:
- Uapi to configure/unconfigure/dump per vlan tunnel data
- netlink functions to configure vlan and tunnel_info mapping
- Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
dst_metadata to bridged packets on ports. off by default.
- changes to existing code is mainly refactor some existing vlan
handling netlink code + hooks for new vlan tunnel code
- I have kept the vlan tunnel code isolated in separate files.
- most of the netlink vlan tunnel code is handling of vlan-tunid
ranges (follows the vlan range handling code). To conserve space
vlan-tunid by default are always dumped in ranges if applicable.
Use case:
example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis).
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's and
vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metdata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 1 Feb 2017 06:59:53 +0000 (22:59 -0800)]
bridge: uapi: add per vlan tunnel info
New nested netlink attribute to associate tunnel info per vlan.
This is used by bridge driver to send tunnel metadata to
bridge ports in vlan tunnel mode. This patch also adds new per
port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
off by default.
One example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis). User can configure
per-vlan tunnel information which the bridge driver can use
to bridge vlan into the corresponding vn-segment.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 1 Feb 2017 06:59:52 +0000 (22:59 -0800)]
vxlan: support fdb and learning in COLLECT_METADATA mode
Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.
use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.
Changes introduced by this patch:
- allow learning and forwarding database state in vxlan netdev in
COLLECT_METADATA mode. Current behaviour is not changed
by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
to support the new bridge friendly mode.
- A single fdb table hashed by (mac, vni) to allow fdb entries with
multiple vnis in the same fdb table
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata
- prior to this series, fdb remote_dsts carried remote vni and
the vxlan device carrying the fdb table represented the
source vni. With the vxlan device now representing multiple vnis,
this patch adds a src vni attribute to the fdb entry. The remote
vni already uses NDA_VNI attribute. This patch introduces
NDA_SRC_VNI netlink attribute to represent the src vni in a multi
vni fdb table.
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 1 Feb 2017 06:59:51 +0000 (22:59 -0800)]
ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
New ip_tunnel_info flag to represent bridged tunnel metadata.
Used by bridge driver later in the series to pass per vlan dst
metadata to bridge ports.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 20:16:46 +0000 (15:16 -0500)]
Merge branch 'ife-to-module'
Yotam Gigi says:
====================
Extract IFE logic to module
Extract ife logic from the tc_ife action into an independent module, and
make the tc_ife action use it. This way, the ife encapsulation can be used
by other modules other than tc_ife action.
v1->v2:
Fix duplicate symbol error by introducing a new patch that makes the
original symbol static.
The symbol ife_tlv_meta_extract is exported in act_ife, though not being
used by any other module. As the symbol is being moved to the new ife
module, introducing the new module creates duplicate symbol. To fix it,
add a new patch (1/3) that makes the ife_tlv_meta_extract symbol static in
act_ife, thus the symbol does not collide.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Wed, 1 Feb 2017 13:30:03 +0000 (15:30 +0200)]
net/sched: act_ife: Change to use ife module
Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Wed, 1 Feb 2017 13:30:02 +0000 (15:30 +0200)]
net: Introduce ife encapsulation module
This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
- ife_encode: encode skb and reserve space for the ife meta header
- ife_decode: decode skb and extract the meta header size
- ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
header space.
- ife_tlv_meta_decode - decodes one tlv entry from the packet
- ife_tlv_meta_next - advance to the next tlv
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Wed, 1 Feb 2017 13:30:01 +0000 (15:30 +0200)]
net/sched: act_ife: Unexport ife_tlv_meta_encode
As the function ife_tlv_meta_encode is not used by any other module,
unexport it and make it static for the act_ife module.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 16:04:56 +0000 (08:04 -0800)]
tcp: add tcp_mss_clamp() helper
Small cleanup factorizing code doing the TCP_MAXSEG clamping.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 2 Feb 2017 14:49:24 +0000 (15:49 +0100)]
hns_enet: use cpumask_var_t for on-stack mask
On large SMP builds, we can run into a build warning:
drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 'hns_set_irq_affinity.isra.27':
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1242:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
The solution here is to use cpumask_var_t, which can use dynamic
allocation when CONFIG_CPUMASK_OFFSTACK is enabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 14:35:36 +0000 (06:35 -0800)]
virtio_net: remove custom busy_poll
Generic NAPI busy polling allows us to remove custom implementations
found in drivers.
It is possible further optimization could be done by testing
napi_complete_done() return value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 14:09:14 +0000 (06:09 -0800)]
atl1e: add GRO support
It is time to add GRO support to this driver.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun V [Thu, 2 Feb 2017 07:13:29 +0000 (12:43 +0530)]
cxgb4: Fix uld_send() for ctrl pkts
Without any uld being loaded, uld_txq_info[] will be NULL. uld_send()
is also used for sending control work requests(for eg: setting filter)
that dont require any ulds to be loaded. Hence move uld_txq_info[]
assignment after ctrl_xmit().
Also added a NULL check for uld_txq_info[].
Fixes: 94cdb8bb993a (cxgb4: Add support for dynamic allocation
of resources for ULD).
Signed-off-by: Arjun V <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 02:22:28 +0000 (18:22 -0800)]
sfc-falcon: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 3 Feb 2017 01:13:19 +0000 (17:13 -0800)]
sfc: get rid of custom busy polling code
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Brady [Tue, 29 Nov 2016 00:06:03 +0000 (16:06 -0800)]
i40e: add interrupt rate limit verbosity
Due to the resolution of the register controlling interrupt rate
limiting, setting certain values for the interrupt rate limit make it
appear as though the limiting is not completely accurate. The problem
is that the interrupt rate limit is getting rounded down to the nearest
multiple of 4. This patch fixes the problem by adding some feedback to
the user as to the actual interrupt rate limit being used when it
differs from the requested limit. Without this patch setting interrupt
rate limits may appear to behave inaccurately.
Change-ID: I3093cf3f2d437d35a4c4f4bb5af5ce1b85ab21b7
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alan Brady [Tue, 29 Nov 2016 00:06:02 +0000 (16:06 -0800)]
i40e: refactor macro INTRL_USEC_TO_REG
This patch refactors the macro INTRL_USEC_TO_REG into a static inline
function and fixes a couple subtle bugs caused by the macro.
This patch fixes a bug which was caused by passing a bad register value
to the firmware. If enabling interrupt rate limiting, a non-zero value
for the rate limit must be used. Otherwise the firmware sets the
interrupt rate limit to the maximum value. Due to the limited
resolution of the register, attempting to set a value of 1, 2, or 3
would be rounded down to 0 and limiting was left enabled, causing
unexpected behavior.
This patch also fixes a possible bug in which using the macro itself can
introduce unintended side-affects because the macro argument is used
more than once in the macro definition (e.g. a variable post-increment
argument would perform a double increment on the variable).
Without this patch, attempting to set interrupt rate limits of 1, 2, or
3 results in unexpected behavior and future use of this macro could
cause subtle bugs.
Change-Id: I83ac842de0ca9c86761923d6e3a4d7b1b95f2b3f
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 29 Nov 2016 00:06:01 +0000 (16:06 -0800)]
i40e: remove unused function
After refactoring the client open and close code, this is no longer
needed. Remove it.
Change-ID: If8e6e32baa354d857c2fd8b2f19404f1786011c4
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jayaprakash Shanmugam [Tue, 29 Nov 2016 00:06:00 +0000 (16:06 -0800)]
i40e: Remove FPK HyperV VF device ID
Requirement for VFs to use the VMBus has been removed that's why
removing Hyper-V VF device ID.
Change-ID: I84f0964f443ee0db3e5e444b5ace996eb71b8280
Signed-off-by: Jayaprakash Shanmugam <jayaprakash.shanmugam@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 29 Nov 2016 00:05:59 +0000 (16:05 -0800)]
i40e: Quick refactor to start moving data off stack and into Tx buffer info
This patch does some quick work to pull some of the data off of the stack
and hopefully start storing it in the Tx buffer info section of the Tx
ring. Ideally we should be moving away from having to store much of
anything on the stack and can just maintain it all in the descriptor rings.
Change-ID: I4b4715ea1920e122502482b3f9e56a9a6cb1e9fe
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tushar Dave [Sat, 19 Nov 2016 21:53:58 +0000 (13:53 -0800)]
i40e: remove unnecessary __packed
'struct i40e_dma_mem' defined with 'packed' directive causing kernel
unaligned errors on sparc.
e.g.
i40e: Intel(R) Ethernet Connection XL710 Network Driver - version
1.6.16-k
i40e: Copyright (c) 2013 - 2014 Intel Corporation.
Kernel unaligned access at TPC[44894c] dma_4v_alloc_coherent+0x1ac/0x300
Kernel unaligned access at TPC[44894c] dma_4v_alloc_coherent+0x1ac/0x300
Kernel unaligned access at TPC[44894c] dma_4v_alloc_coherent+0x1ac/0x300
Kernel unaligned access at TPC[44894c] dma_4v_alloc_coherent+0x1ac/0x300
Kernel unaligned access at TPC[44894c] dma_4v_alloc_coherent+0x1ac/0x300
i40e 0000:03:00.0: fw 5.1.40981 api 1.5 nvm 5.04 0x80002548 0.0.0
This can be fixed with get_unaligned/put_unaligned(). However no
reference in driver shows that 'struct i40e_dma_mem' directly shoved
into NIC hardware. But instead fields of the struct are being read and
used for hardware. Therefore, __packed is unnecessary for 'struct
i40e_dma_mem'.
In addition, although 'struct i40e_virt_mem' doesn't cause any
unaligned access, keeping it packed is unnecessary as well because
of aforementioned reason.
This change make 'struct i40e_dma_mem' and 'struct i40e_virt_mem'
unpacked.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 11 Nov 2016 20:39:39 +0000 (12:39 -0800)]
i40evf: remove unused device ID
This device ID was intended for use when running Linux VF drivers under
Hyper-V, but we have determined that it is not necessary. Since it is
unused, and will never be used, remove it.
Change-ID: I74998ab4237db043cd400547bb54a0a5e2a37ea5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bimmy Pujari [Fri, 11 Nov 2016 20:39:38 +0000 (12:39 -0800)]
i40e: Deprecating unused macro
I40E_MAC_X710 was supposed to be for 10G and I40E_MAC_XL710
was supposed to be for 40G. But function i40e_is_mac_710
sets I40E_MAC_XL710 for all device IDS, I40E_MAC_X710 is not
used at all. As there is nothing to compare there is no need
for this function. Thus deprecating this extra macro and
removing this function entirely and replacing it with a direct
check.
Change-ID: I7d1769954dccd574a290ac04adb836ebd156730e
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 11 Nov 2016 20:39:37 +0000 (12:39 -0800)]
i40e: when adding or removing MAC filters, correctly handle VLANs
Instead of using i40e_add_filter or i40e_del_filter directly, when
adding a MAC address, we should normally be using i40e_add_mac_filter or
i40e_del_mac_filter. These functions correctly handle the various cases
of VLAN mode or PVID settings. This ensures consistency and avoids the
issues that can occur with the recent addition of a WARN_ON() in
i40e_sync_vsi_filters.
Change-ID: I7fe62db063391fdd1180b2d6a6a3c5ab4307eeee
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 11 Nov 2016 20:39:36 +0000 (12:39 -0800)]
i40e: avoid O(n^2) loop when deleting all filters
Use __i40e_del_filter instead of using i40e_del_filter() which will
avoid doing an additional search to delete a filter we already have the
pointer for.
Change-ID: Iea5a7e3cafbf8c682ed9d3b6c69cf5ff53f44daf
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 11 Nov 2016 20:39:35 +0000 (12:39 -0800)]
i40e: rename i40e_put_mac_in_vlan and i40e_del_mac_all_vlan
These functions purpose is to add a new MAC filter correctly, whether
we're using VLANs or not. Their goal is to ensure that all active VLANs
get the new MAC filter. Rename them so that their intent is clear. They
function correctly regardless of whether we have any active VLANs or
only have I40E_VLAN_ANY filters. The new names convey how they function
in a more clear manner.
Change-ID: Iec1961f968c0223a7132724a74e26a665750b107
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 11 Nov 2016 20:39:34 +0000 (12:39 -0800)]
i40e: no need to check is_vsi_in_vlan before calling i40e_del_mac_all_vlan
This function won't be appreciably slower when in VLAN mode, so there is
no real reason to not just call it directly. In either case, we still
must search the full table for a MAC/VLAN pair. We do get to stop
searching a tiny bit early in the case of knowing we are not in VLAN
mode, but this is a minor savings and we can avoid the code complexity
by not having to worry about the check.
Change-ID: I533412195b3a42f51cf629e3675dd5145aea8625
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 11 Nov 2016 20:39:33 +0000 (12:39 -0800)]
i40e: fold the i40e_is_vsi_in_vlan check into i40e_put_mac_in_vlan
Fold the check for determining when to call i40e_put_mac_in_vlan directly
into the function so that we don't need to decide which function to use
ahead of time. This allows us to just call i40e_put_mac_in_vlan directly
without having to check ahead of time.
Change-ID: Ifff526940748ac14b8418be5df5a149502eed137
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 7 Dec 2016 22:05:34 +0000 (14:05 -0800)]
i40e: don't allow i40e_vsi_(add|kill)_vlan to operate when VID<1
Now that we have the separate i40e_(add|rm)_vlan_all_mac functions, we
should not be using the i40e_vsi_kill_vlan or i40e_vsi_add_vlan
functions when PVID is set or when VID is less than 1. This allows us to
remove some checks in i40e_vsi_add_vlan and ensures that callers which
need to handle VID=0 or VID=-1 don't accidentally invoke the VLAN mode
handling used to convert filters when entering VLAN mode. We also update
the functions to take u16 instead of s16 as well since they no longer
expect to be called with VID=I40E_VLAN_ANY.
Change-ID: Ibddf44a8bb840dde8ceef2a4fdb92fd953b05a57
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Thu, 2 Feb 2017 04:47:59 +0000 (20:47 -0800)]
net: add LINUX_MIB_PFMEMALLOCDROP counter
Debugging issues caused by pfmemalloc is often tedious.
Add a new SNMP counter to more easily diagnose these problems.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 2 Feb 2017 02:41:25 +0000 (18:41 -0800)]
net: ipv4: remove fib_lookup.h from devinet.c include list
nothing in devinet.c relies on fib_lookup.h; remove it from the includes
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Feb 2017 02:13:23 +0000 (18:13 -0800)]
net: remove useless pfmemalloc setting
When __alloc_skb() allocates an skb from fast clone cache,
setting pfmemalloc on the clone is not needed.
Clone will be properly initialized later at skb_clone() time,
including pfmemalloc field, as it is included in the
headers_start/headers_end section which is fully copied.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 03:06:45 +0000 (22:06 -0500)]
Merge branch 'MV88E6390-fixes'
Andrew Lunn says:
====================
MV88E6390 fixes
Two patches, which have been posted before. Fix simple issues in the
mv88e6390 support. These don't need to go to stable, since the
mv88e6390 support in stable is insufficient to be usable.
To apply cleanly, these patches rely on "net: dsa: mv88e6xxx:
Workaround missing PHY".
v2: Added Reviewed-by.
Removed a redundant "the"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 1 Feb 2017 23:46:16 +0000 (00:46 +0100)]
net: dsa: mv88e6xxx: Fix typ0 when configuring 2.5Gbps
In order to enable 2.5Gbps mode, we need the base speed of 10G, plus
the Alt bit setting. Fix a typ0 that used 1Gb base speed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 1 Feb 2017 23:46:15 +0000 (00:46 +0100)]
net: dsa: mv88e6xxx: Fix ATU age timer for MV88E6390
The MV88E6390 family uses a different ATU age timer coefficient.
Fix the info structures.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 1 Feb 2017 23:35:03 +0000 (00:35 +0100)]
net: phy: marvell: Add support for
88e1545 PHY
The
88e1545 PHYs are discrete Marvell PHYs, found in a quad package on
the zii-devel-b board. Add support for it to the Marvell PHY driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 2 Feb 2017 07:20:12 +0000 (08:20 +0100)]
net: stmmac: Fix wrong message in stmmac_probe_config_dt
Most likely a copy & paste error in referenced commit.
Restore the debug message to what it was before.
Fixes: f573c0b9c4e0 ("stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform structure")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 1 Feb 2017 21:02:02 +0000 (22:02 +0100)]
net: stmmac: add separate warning for PTP not being supported by HW
Chips like Amlogic S905GXBB are supported by this driver but don't
have support for PTP. Add a separate warning for missing HW support
to differentiate it from other actual failures.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 1 Feb 2017 19:19:25 +0000 (20:19 +0100)]
net: stmmac: don't set tx delay in RGMII_ID and RGMII_TXID mode
As documented in Documentation/devicetree/bindings/net/ethernet.txt,
in RGMII_ID and RGMII_TXID mode the MAC should not add a tx delay.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Wed, 1 Feb 2017 19:00:45 +0000 (11:00 -0800)]
unix: add ioctl to open a unix socket file with O_PATH
This ioctl opens a file to which a socket is bound and
returns a file descriptor. The caller has to have CAP_NET_ADMIN
in the socket network namespace.
Currently it is impossible to get a path and a mount point
for a socket file. socket_diag reports address, device ID and inode
number for unix sockets. An address can contain a relative path or
a file may be moved somewhere. And these properties say nothing about
a mount namespace and a mount point of a socket file.
With the introduced ioctl, we can get a path by reading
/proc/self/fd/X and get mnt_id from /proc/self/fdinfo/X.
In CRIU we are going to use this ioctl to dump and restore unix socket.
Here is an example how it can be used:
$ strace -e socket,bind,ioctl ./test /tmp/test_sock
socket(AF_UNIX, SOCK_STREAM, 0) = 3
bind(3, {sa_family=AF_UNIX, sun_path="test_sock"}, 11) = 0
ioctl(3, SIOCUNIXFILE, 0) = 4
^Z
$ ss -a | grep test_sock
u_str LISTEN 0 1 test_sock 17798 * 0
$ ls -l /proc/760/fd/{3,4}
lrwx------ 1 root root 64 Feb 1 09:41 3 -> 'socket:[17798]'
l--------- 1 root root 64 Feb 1 09:41 4 -> /tmp/test_sock
$ cat /proc/760/fdinfo/4
pos: 0
flags:
012000000
mnt_id: 40
$ cat /proc/self/mountinfo | grep "^40\s"
40 19 0:37 / /tmp rw shared:23 - tmpfs tmpfs rw
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Feb 2017 02:50:52 +0000 (21:50 -0500)]
Merge branch 'mv88e6390-missing-phy-ID'
Andrew Lunn says:
====================
Work around missing PHY prodcut ID in mv88e6390
The internal PHYs of the MV88E6390 have a Marvell OUI, but the product
ID is zero. Work around this by trapping reads to the ID, and if it is
zero, return the MV88E6390 family ID.
v2: Move the workaround into the central mdio read function.
Enable the temperature sensor, even though it does not work on the 6390,
but it does on the 6341, which has the same ID problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 1 Feb 2017 02:40:06 +0000 (03:40 +0100)]
net: phy: Marvell: Add mv88e6390 internal PHY
The mv88e6390 Ethernet switch has internal PHYs. These PHYs don't have
an model ID in the ID2 register. So the MDIO driver in the switch
intercepts reads to this register, and returns the switch family ID.
Extend the Marvell PHY driver by including this ID, and treat the PHY
as a
88E1540.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 1 Feb 2017 02:40:05 +0000 (03:40 +0100)]
net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390
The internal PHYs of the mv88e6390 do not have a model ID. Trap any
calls to the ID register, and if it is zero, return the ID for the
mv88e6390. The Marvell PHY driver can then bind to this ID.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 1 Feb 2017 16:46:02 +0000 (17:46 +0100)]
xgene_enet: remove bogus forward declarations
The device match tables for both the xgene_enet driver and its phy driver
have forward declarations that declare an array without a length, leading
to a clang warning when they are not followed by an actual defitinition:
drivers/net/ethernet/apm/xgene/../../../phy/mdio-xgene.h:135:34: warning: tentative array definition assumed to have one element
drivers/net/ethernet/apm/xgene/xgene_enet_main.c:33:36: warning: tentative array definition assumed to have one element
The declarations for the mdio driver are even in a header file, so they
cause duplicate definitions of the tables for each file that includes
them.
This removes all four forward declarations and moves the actual
definitions up a little, so they are in front of their first user. For
the OF match tables, this means having to remove the #ifdef around them,
and passing the actual structure into of_match_device(). This has no
effect on the generated object code though, as the of_match_device
function has an empty stub that does not evaluate its argument, and
the symbol gets dropped either way.
Fixes: 43b3cf6634a4 ("drivers: net: phy: xgene: Add MDIO driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 2 Feb 2017 21:54:00 +0000 (16:54 -0500)]
Merge git://git./linux/kernel/git/davem/net
All merge conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Tue, 31 Jan 2017 09:30:06 +0000 (10:30 +0100)]
netfilter: allow logging from non-init namespaces
Commit
69b34fb996b2 ("netfilter: xt_LOG: add net namespace support for
xt_LOG") disabled logging packets using the LOG target from non-init
namespaces. The motivation was to prevent containers from flooding
kernel log of the host. The plan was to keep it that way until syslog
namespace implementation allows containers to log in a safe way.
However, the work on syslog namespace seems to have hit a dead end
somewhere in 2013 and there are users who want to use xt_LOG in all
network namespaces. This patch allows to do so by setting
/proc/sys/net/netfilter/nf_log_all_netns
to a nonzero value. This sysctl is only accessible from init_net so that
one cannot switch the behaviour from inside a container.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David Windsor [Tue, 24 Jan 2017 03:24:29 +0000 (22:24 -0500)]
ipvs: free ip_vs_dest structs when refcnt=0
Currently, the ip_vs_dest cache frees ip_vs_dest objects when their
reference count becomes < 0. Aside from not being semantically sound,
this is problematic for the new type refcount_t, which will be introduced
shortly in a separate patch. refcount_t is the new kernel type for
holding reference counts, and provides overflow protection and a
constrained interface relative to atomic_t (the type currently being
used for kernel reference counts).
Per Julian Anastasov: "The problem is that dest_trash currently holds
deleted dests (unlinked from RCU lists) with refcnt=0." Changing
dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest
structs when their refcnt=0, in ip_vs_dest_put_and_free().
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:59 +0000 (18:21 +0100)]
netfilter: merge ctinfo into nfct pointer storage area
After this change conntrack operations (lookup, creation, matching from
ruleset) only access one instead of two sk_buff cache lines.
This works for normal conntracks because those are allocated from a slab
that guarantees hw cacheline or 8byte alignment (whatever is larger)
so the 3 bits needed for ctinfo won't overlap with nf_conn addresses.
Template allocation now does manual address alignment (see previous change)
on arches that don't have sufficent kmalloc min alignment.
Some spots intentionally use skb->_nfct instead of skb_nfct() helpers,
this is to avoid undoing the skb_nfct() use when we remove untracked
conntrack object in the future.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:58 +0000 (18:21 +0100)]
netfilter: guarantee 8 byte minalign for template addresses
The next change will merge skb->nfct pointer and skb->nfctinfo
status bits into single skb->_nfct (unsigned long) area.
For this to work nf_conn addresses must always be aligned at least on
an 8 byte boundary since we will need the lower 3bits to store nfctinfo.
Conntrack templates are allocated via kmalloc.
kbuild test robot reported
BUILD_BUG_ON failed: NFCT_INFOMASK >= ARCH_KMALLOC_MINALIGN
on v1 of this patchset, so not all platforms meet this requirement.
Do manual alignment if needed, the alignment offset is stored in the
nf_conn entry protocol area. This works because templates are not
handed off to L4 protocol trackers.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:57 +0000 (18:21 +0100)]
netfilter: add and use nf_ct_set helper
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:56 +0000 (18:21 +0100)]
skbuff: add and use skb_nfct helper
Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:55 +0000 (18:21 +0100)]
netfilter: reduce direct skb->nfct usage
Next patch makes direct skb->nfct access illegal, reduce noise
in next patch by using accessors we already have.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:54 +0000 (18:21 +0100)]
netfilter: reset netfilter state when duplicating packet
We should also toss nf_bridge_info, if any -- packet is leaving via
ip_local_out, also, this skb isn't bridged -- it is a locally generated
copy. Also this avoids the need to touch this later when skb->nfct is
replaced with 'unsigned long _nfct' in followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 23 Jan 2017 17:21:53 +0000 (18:21 +0100)]
netfilter: conntrack: no need to pass ctinfo to error handler
It is never accessed for reading and the only places that write to it
are the icmp(6) handlers, which also set skb->nfct (and skb->nfctinfo).
The conntrack core specifically checks for attached skb->nfct after
->error() invocation and returns early in this case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Feng [Fri, 20 Jan 2017 13:40:43 +0000 (21:40 +0800)]
netfilter: nf_tables: Eliminate duplicated code in nf_tables_table_enable()
If something fails in nf_tables_table_enable(), it unregisters the
chains. But the rollback code is the same as nf_tables_table_disable()
almostly, except there is one counter check. Now create one wrapper
function to eliminate the duplicated codes.
Signed-off-by: Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Wed, 1 Feb 2017 19:52:27 +0000 (11:52 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix handling of interrupt status in stmmac driver. Just because we
have masked the event from generating interrupts, doesn't mean the
bit won't still be set in the interrupt status register. From Alexey
Brodkin.
2) Fix DMA API debugging splats in gianfar driver, from Arseny Solokha.
3) Fix off-by-one error in __ip6_append_data(), from Vlad Yasevich.
4) cls_flow does not match on icmpv6 codes properly, from Simon Horman.
5) Initial MAC address can be set incorrectly in some scenerios, from
Ivan Vecera.
6) Packet header pointer arithmetic fix in ip6_tnl_parse_tlv_end_lim(),
from Dan Carpenter.
7) Fix divide by zero in __tcp_select_window(), from Eric Dumazet.
8) Fix crash in iwlwifi when unregistering thermal zone, from Jens
Axboe.
9) Check for DMA mapping errors in starfire driver, from Alexey
Khoroshilov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
tcp: fix 0 divide in __tcp_select_window()
ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
net: fix ndo_features_check/ndo_fix_features comment ordering
net/sched: matchall: Fix configuration race
be2net: fix initial MAC setting
ipv6: fix flow labels when the traffic class is non-0
net: thunderx: avoid dereferencing xcv when NULL
net/sched: cls_flower: Correct matching on ICMPv6 code
ipv6: Paritially checksum full MTU frames
net/mlx4_core: Avoid command timeouts during VF driver device shutdown
gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
net: ethtool: add support for 2500BaseT and 5000BaseT link modes
can: bcm: fix hrtimer/tasklet termination in bcm op removal
net: adaptec: starfire: add checks for dma mapping errors
net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
can: Fix kernel panic at security_sock_rcv_skb
net: macb: Fix 64 bit addressing support for GEM
stmmac: Discard masked flags in interrupt status register
net/mlx5e: Check ets capability before ets query FW command
net/mlx5e: Fix update of hash function/key via ethtool
...
Rafał Miłecki [Tue, 31 Jan 2017 21:54:54 +0000 (22:54 +0100)]
net: phy: broadcom: rehook BCM54612E specific init
This extra BCM54612E code in PHY driver isn't really aneg specific. Even
without it aneg works OK but the problem is no packets pass through PHY.
Moreover putting this code inside config_aneg callback didn't allow
resuming PHY correctly. When driver called phy_stop and phy_start it was
putting PHY machine into RESUMING state. After that machine was
switching into AN and NOLINK without ever calling phy_start_aneg. This
prevented this extra setup from being called and PHY didn't work.
This change has been verified to fix network on BCM47186B0 SoC device
with BCM54612E.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 Feb 2017 19:10:04 +0000 (14:10 -0500)]
Merge branch 'act_sample-Little-fixes'
Yotam Gigi says:
====================
net/sched: act_sample: Little fixes
Little fixes in sample tc action.
====================
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 31 Jan 2017 09:33:54 +0000 (11:33 +0200)]
net/sched: act_psample: Remove unnecessary ASSERT_RTNL
The ASSERT_RTNL is not necessary in the init function, as it does not
touch any rtnl protected structures, as opposed to the mirred action which
does have to hold a net device.
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 31 Jan 2017 09:33:53 +0000 (11:33 +0200)]
net/sched: act_sample: Fix error path in init
Fix error path of in sample init, by releasing the tc hash in case of
failure in psample_group creation.
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 1 Feb 2017 18:30:56 +0000 (10:30 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull fscache fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fscache: Fix dead object requeue
fscache: Clear outstanding writes when disabling a cookie
FS-Cache: Initialise stores_lock in netfs cookie
Eric Dumazet [Wed, 1 Feb 2017 16:33:53 +0000 (08:33 -0800)]
tcp: fix 0 divide in __tcp_select_window()
syszkaller fuzzer was able to trigger a divide by zero, when
TCP window scaling is not enabled.
SO_RCVBUF can be used not only to increase sk_rcvbuf, also
to decrease it below current receive buffers utilization.
If mss is negative or 0, just return a zero TCP window.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>