Kirill Tkhai [Mon, 5 Mar 2018 11:31:37 +0000 (14:31 +0300)]
net: Convert dccp_v6_ops
These pernet_operations looks similar to dccp_v4_ops,
and they are also safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:28 +0000 (14:31 +0300)]
net: Convert dccp_v4_ops
These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:19 +0000 (14:31 +0300)]
net: Convert cangw_pernet_ops
These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:10 +0000 (14:31 +0300)]
net: Convert caif_net_ops
Init method just allocates memory for new cfg, and
assigns net_generic(net, caif_net_id). Despite there is
synchronize_rcu() on error path in cfcnfg_create(),
in real this function does not use global lists,
so it looks like this synchronize_rcu() is some legacy
inheritance. Exit method removes caif devices under
rtnl_lock().
There could be a problem, if someone from foreign net
pernet_operations dereference caif_net_id of this net.
It's dereferenced in get_cfcnfg() and caif_device_list().
get_cfcnfg() is used from netdevice notifiers, where
they are called under rtnl_lock(). The notifiers can't
be called from foreign nets pernet_operations. Also,
it's used from caif_disconnect_client() and from
caif_connect_client(). The both of the functions work
with caif socket, and there is the only possibility
to have a socket, when the net is dead. This may happen
only of the socket was created as kern using sk_alloc().
Grep by PF_CAIF shows we do not create kern caif sockets,
so get_cfcnfg() is safe.
caif_device_list() is used in netdevice notifiers and exit
method under rtnl lock. Also, from caif_get() used in
the netdev notifiers and in caif_flow_cb(). The last item
is skb destructor. Since there are no kernel caif sockets
nobody can send net a packet in parallel with init/exit,
so this is also safe.
So, these pernet_operations are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:00 +0000 (14:31 +0300)]
net: Convert arp_tables_net_ops and ip6_tables_net_ops
These pernet_operations call xt_proto_init() and xt_proto_fini(),
which just register and unregister /proc entries.
They are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:30:50 +0000 (14:30 +0300)]
net: Convert log pernet_operations
These pernet_operations use nf_log_set() and nf_log_unset()
in their methods:
nf_log_bridge_net_ops
nf_log_arp_net_ops
nf_log_ipv4_net_ops
nf_log_ipv6_net_ops
nf_log_netdev_net_ops
Nobody can send such a packet to a net before it's became
registered, nobody can send a packet after all netdevices
are unregistered. So, these pernet_operations are able
to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:30:41 +0000 (14:30 +0300)]
net: Convert broute_net_ops, frame_filter_net_ops and frame_nat_net_ops
These pernet_operations use ebt_register_table() and
ebt_unregister_table() to act on the tables, which
are used as argument in ebt_do_table(), called from
ebtables hooks.
Since there are no net-related bridge packets in-flight,
when the init and exit methods are called, these
pernet_operations are safe to be executed in parallel
with any other.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 5 Mar 2018 01:37:47 +0000 (17:37 -0800)]
selftests: forwarding: Add suppport to create veth interfaces
For tests using veth interfaces, the test infrastructure can create
the netdevs if they do not exist. Arguably this is a preferred approach
since the tests require p$N and p$(N+1) to be pairs.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Mendoza-Jonas [Mon, 5 Mar 2018 00:39:05 +0000 (11:39 +1100)]
net/ncsi: Add generic netlink family
Add a generic netlink family for NCSI. This supports three commands;
NCSI_CMD_PKG_INFO which returns information on packages and their
associated channels, NCSI_CMD_SET_INTERFACE which allows a specific
package or package/channel combination to be set as the preferred
choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Sun, 4 Mar 2018 18:38:36 +0000 (10:38 -0800)]
tcp: add ca_state stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_CA_STATE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports ca_state of socket, when timestamp is generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Sun, 4 Mar 2018 18:38:35 +0000 (10:38 -0800)]
tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_SENDQ_SIZE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports no. of bytes present in send queue, when timestamp is
generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Sun, 4 Mar 2018 14:35:26 +0000 (16:35 +0200)]
selftests: Extend the tc action test for action mirror
Currently the tc action test is used only to test mirred redirect
action. This patch extends it for mirred mirror.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 4 Mar 2018 12:12:04 +0000 (14:12 +0200)]
net: Make RX-FCS and LRO mutually exclusive
LRO and RX-FCS offloads cannot be enabled at the same time since it is
not clear what should happen to the FCS of each coalesced packet.
The FCS is not really part of the TCP payload, hence cannot be merged
into one big packet. On the other hand, providing one big LRO packet
with one FCS contradicts the RX-FCS feature goal.
Use the fix features mechanism in order to prevent intersection of the
features and drop LRO in case RX-FCS is requested.
Enabling RX-FCS while LRO is enabled will result in:
$ ethtool -K ens6 rx-fcs on
Actual changes:
large-receive-offload: off [requested on]
rx-fcs: on
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Sat, 3 Mar 2018 02:29:04 +0000 (18:29 -0800)]
liquidio: Corrected Rx bytes counting
Corrected stats mismatch between Host Tx and its peer Rx stats
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Sat, 3 Mar 2018 01:52:01 +0000 (20:52 -0500)]
net sched actions: corrected extack message
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:45:39 +0000 (18:45 -0500)]
Merge tag 'batadv-next-for-davem-
20180302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- bump copyright years, by Sven Eckelmann
- fix macro indendation for checkpatch, by Sven Eckelmann
- fix comparison operator for bool returning functions,
by Sven Eckelmann
- assume 2-byte packet alignments for all packet types,
by Matthias Schiffer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Mar 2018 15:03:32 +0000 (16:03 +0100)]
ipvlan: forbid vlan devices on top of ipvlan
Currently we allow the creation of 8021q devices on top of
ipvlan, but such devices are nonfunctional, as the underlying
ipvlan rx_hanlder hook can't match the relevant traffic.
Be explicit and forbid the creation of such nonfunctional devices.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 2 Mar 2018 09:29:14 +0000 (17:29 +0800)]
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Fri, 2 Mar 2018 02:22:20 +0000 (11:22 +0900)]
selftests: rtnetlink: remove testns on test fail
This patch removes testns after test failure so that next test can
continue with clean ns
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:35:02 +0000 (18:35 -0500)]
Merge branch 'gre-seq-collect_md'
William Tu says:
====================
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native tunnel mode.
The first patch adds sequence number support for gre collect
metadata mode, and the second patch tests it using BPF.
RFC2890 defines GRE sequence number to be specific to the traffic
flow identified by the key. However, this patch does not implement
per-key seqno. The sequence number is shared in the same tunnel
device. That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
A new BFP uapi tunnel flag 'BPF_F_SEQ_NUMBER' is added.
--
v1->v2:
rename BPF_F_GRE_SEQ to BPF_F_SEQ_NUMBER suggested by Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 1 Mar 2018 21:49:58 +0000 (13:49 -0800)]
samples/bpf: add gre sequence number test.
The patch adds tests for GRE sequence number
support for metadata mode tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 1 Mar 2018 21:49:57 +0000 (13:49 -0800)]
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native
tunnel mode. This patch adds sequence number support for
gre collect metadata mode. RFC2890 defines GRE sequence
number to be specific to the traffic flow identified by the
key. However, this patch does not implement per-key seqno.
The sequence number is shared in the same tunnel device.
That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:19:26 +0000 (18:19 -0500)]
Merge branch 'enic-update'
Govindarajulu Varadarajan says:
====================
enic update
This series adds support for IPv6 vxlan offload and UDP rss along with a
bug fix in filling the rq ring.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:24 +0000 (11:07 -0800)]
enic: set IG desc cache flag in open
New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.
Also increment driver version
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:23 +0000 (11:07 -0800)]
enic: enable rq before updating rq descriptors
rq should be enabled before posting the buffers to rq desc. If not hw sees
stale value and casuses DMAR errors.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:22 +0000 (11:07 -0800)]
enic: set UDP rss flag
New hardware needs UDP flag set to enable UDP L4 rss hash. Add ethtool
get option to display supported rss flow hash.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:21 +0000 (11:07 -0800)]
enic: Check if hw supports multi wq with vxlan offload
Some adaptors do not support vxlan offload when multi wq is configured.
If hw supports multi wq, BIT(2) is set in a1.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:20 +0000 (11:07 -0800)]
enic: Add vxlan offload support for IPv6 pkts
New adaptors supports vxlan offload for inner IPv6 and outer IPv6 vxlan
pkts.
Fw sets BIT(0) & BIT(1) in a1 if hw supports ipv6 inner & outer pkt
offload.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:19 +0000 (11:07 -0800)]
enic: Check inner ip proto for pseudo header csum
To compute pseudo IP header csum, we need to check the inner header for
encap pkt, not outer IP header.
Also add pseudo csum for IPv6 inner pkt.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Mar 2018 16:42:40 +0000 (16:42 +0000)]
net: amd8111e: remove redundant assignment to 'tx_index'
The variable tx_index is being initialized with a value that is never
read and re-assigned a little later, hence the initialization is redundant
and can be removed.
Cleans up clang warning:
drivers/net/ethernet/amd/amd8111e.c:652:6: warning: Value stored to
'tx_index' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:35 +0000 (13:27 +0200)]
r8169: switch to device-managed functions in probe (part 2)
This is a follow up to the commit
4c45d24a759d ("r8169: switch to device-managed functions in probe")
to move towards managed resources even more.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:34 +0000 (13:27 +0200)]
r8169: Dereference MMIO address immediately before use
There is no need to dereference struct rtl8169_private to get mmio_addr
in almost every function in the driver.
Replace it by using pointer to struct rtl8169_private directly.
No functional change intended.
Next step might be a conversion of RTL_Wxx() / RTL_Rxx() macros
to inline functions for sake of type checking.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Mar 2018 10:23:03 +0000 (10:23 +0000)]
net: phy: Fix spelling mistake: "advertisment"-> "advertisement"
Trivial fix to spelling mistake in comments and error message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Thu, 1 Mar 2018 09:31:04 +0000 (15:01 +0530)]
cxgb4vf: Forcefully link up virtual interfaces
The Virtual Interfaces are connected to an internal switch on the chip
which allows VIs attached to the same port to talk to each other even
when the port link is down. As a result, we generally want to always
report a VI's link as being "up".
Based on the original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 18:34:19 +0000 (13:34 -0500)]
Merge branch 'dsa-serdes-stats'
Andrew Lunn says:
====================
Export SERDES stats via ethtool -S
The mv88e6352 family has a SERDES interface which can be used for
example to connect to SFF/SFP modules. This interface has a couple of
statistics counters. Add support for including these counters in the
output of ethtool -S.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:31 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Get mv88e6352 SERDES statistics
Add support for reading the SERDES statistics of the mv88e8352, using
the standard ethtool -S option. The SERDES interface can be mapped to
either port 4 or 5, so only return statistics on those ports, if the
SERDES interface is in use.
The counters are reset on read, so need to be accumulated. Add a per
port structure to hold the stats counters. The 6352 only has a single
SERDES interface and so only one port will using the newly added
array. However the 6390 family has as many SERDES interfaces as ports,
each with statistics counters. Also, PTP has a number of counters per
port which will also need accumulating.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:30 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Add helper to determining if port has SERDES
Refactor the existing code. This helper will be used for SERDES
statistics.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:29 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics
When gettting the number of statistics, the strings and the actual
statistics, call the SERDES ops if implemented. This means the stats
code needs to return the number of strings/stats they have placed into
the data, so that the SERDES strings/stats can follow on.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:28 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Hold mutex while doing stats operations
Until now, there has been no need to hold the reg mutex while getting
the count of statistics, or the strings, because the hardware was not
accessed. When adding support for SERDES statistics, it is necessary
to access the hardware, to determine if a port is using the SERDES
interface. So add mutex lock/unlocks.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:27 +0000 (02:02 +0100)]
dsa: Pass the port to get_sset_count()
By passing the port, we allow different ports to have different
statistics. This is useful since some ports have SERDES interfaces
with their own statistic counters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 28 Feb 2018 20:36:19 +0000 (15:36 -0500)]
tools: tc-testing: Add notap option
Add a command line arg to suppress tap output. Handy in case
all the tap output is being supplied by the plugins.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 18:04:24 +0000 (13:04 -0500)]
Merge branch 'net-ipv6-Add-support-for-path-selection-using-hash-of-5-tuple'
David Ahern says:
====================
net/ipv6: Add support for path selection using hash of 5-tuple
Hardware supports multipath selection using the standard L4 5-tuple
instead of just L3 and the flow label. In addition, some network
operators prefer IPv6 path selection to use the 5-tuple. To that end,
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice").
The default is still L3 which covers source and destination addresses
along with flow label and IPv6 protocol. This gives users a choice in
hash algorithms if they believe L3 only and the IPv6 flow label are not
sufficient for their use case.
A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use
different algorithms if desired.
The first 3 patches modify the IPv4 variant so that at the end of the
patch set the ipv4 and ipv6 implementations are direct parallels.
Patch 4 refactors the existing rt6_multipath_hash in preparation for
adding the policy option.
Patch 5 renames the existing netevent to have IPv4 in the name so ipv4
changes can be distinguished from IPv6 if the netevent handler cares.
Patch 6 adds the skb as an argument through the FIB lookup functions
to the multipath selection. Needed for the forwarding case.
Patch 7 adds the L4 hash support.
Patch 8 adds the hook for the netevent to the spectrum driver to update
the ASIC.
Patch 9 removes no longer used code.
Patch 10 adds a testcase for IPv6 multipath with L4 hash.
v3
- comments from Ido:
- removed fib_info arg in patch 1; left by mistake on rebase to net-next
- removed __get_hash_from_flowi4 declaration
- line wrap change to spectrum_router.c to maintain 80 chars
v2
- rebased to top of tree
- added refactor of fib_multipath_hash following recent change
- plumb skb through lookup functions to multipath selection
- fix sysctl setting; was missing the data set in ipv6_sysctl_net_init
- added test case
RFC to v1:
- rebase to top of net-next
- fix addr_type in hash_keys and removed flow label as noticed by Ido
- added a comment to cover letter about choice in algorithms based on
use case per Or's comments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:21 +0000 (08:32 -0800)]
selftests: forwarding: Add multipath test for L4 hashing
Add IPv6 multipath test using L4 hashing. Created with inputs from
Ido Schimmel.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:20 +0000 (08:32 -0800)]
net: Remove unused get_hash_from_flow functions
__get_hash_from_flowi6 is still used for flowlabels, but the IPv4
variant and the wrappers to both are not used. Remove them.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:19 +0000 (08:32 -0800)]
mlxsw: spectrum_router: Add support for ipv6 hash policy update
Similar to
28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:18 +0000 (08:32 -0800)]
net/ipv6: Add support for path selection using hash of 5-tuple
Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:17 +0000 (08:32 -0800)]
net/ipv6: Pass skb to route lookup
IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:16 +0000 (08:32 -0800)]
net: Rename NETEVENT_MULTIPATH_HASH_UPDATE
Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:15 +0000 (08:32 -0800)]
net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash
Make rt6_multipath_hash more of a direct parallel to fib_multipath_hash
and reduce stack and overhead in the process: get_hash_from_flowi6 is
just a wrapper around __get_hash_from_flowi6 with another stack
allocation for flow_keys. Move setting the addresses, protocol and
label into rt6_multipath_hash and allow it to make the call to
flow_hash_from_keys.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:14 +0000 (08:32 -0800)]
net/ipv4: Simplify fib_multipath_hash with optional flow keys
As of commit
e37b1e978bec5 ("ipv6: route: dissect flow in input path if
fib rules need it") fib_multipath_hash takes an optional flow keys. If
non-NULL it means the skb has already been dissected. If not set, then
fib_multipath_hash needs to call skb_flow_dissect_flow_keys.
Simplify the logic by setting flkeys to the local stack variable keys.
Simplifies fib_multipath_hash by only have 1 set of instructions
setting hash_keys.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:13 +0000 (08:32 -0800)]
net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys
Symmetry is good and allows easy comparison that ipv4 and ipv6 are
doing the same thing. To that end, change ip_multipath_l3_keys to
set addresses at the end after the icmp compares, and move the
initialization of ipv6 flow keys to rt6_multipath_hash.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:12 +0000 (08:32 -0800)]
net/ipv4: Pass net to fib_multipath_hash instead of fib_info
fib_multipath_hash only needs net struct to check a sysctl. Make it
clear by passing net instead of fib_info. In the end this allows
alignment between the ipv4 and ipv6 versions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 18:00:58 +0000 (13:00 -0500)]
Merge branch 'sctp-clean-up-sctp_sendmsg'
Xin Long says:
====================
sctp: clean up sctp_sendmsg
This cleanup mostly does three things:
- extract some codes into functions to make sendmsg more readable.
- tidy up some codes to avoid the unnecessary checks.
- adjust some logic so that it will be easier to add the send flags
and cmsgs features that I will post after this.
To make it easy to review and to check if the code is compatible with
before, this patchset is to do it step by step in 9 patches.
NOTE:
There will be a conflict when merging
Commit
2277c7cd75e3 ("sctp: Add LSM hooks") from selinux tree,
the solution is to:
1. remove all the lines in [B]:
<<<<<<< HEAD
[A]
=======
[B]
>>>>>>>
2277c7c... sctp: Add LSM hooks
2. and apply the following diff-output:
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index
980621e..
d6803c8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1686,6 +1686,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
struct net *net = sock_net(sk);
struct sctp_association *asoc;
enum sctp_scope scope;
+ struct sctp_af *af;
int err = -EINVAL;
*tp = NULL;
@@ -1711,6 +1712,22 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
scope = sctp_scope(daddr);
+ /* Label connection socket for first association 1-to-many
+ * style for client sequence socket()->sendmsg(). This
+ * needs to be done before sctp_assoc_add_peer() as that will
+ * set up the initial packet that needs to account for any
+ * security ip options (CIPSO/CALIPSO) added to the packet.
+ */
+ af = sctp_get_af_specific(daddr->sa.sa_family);
+ if (!af)
+ return -EINVAL;
+
+ err = security_sctp_bind_connect(sk, SCTP_SENDMSG_CONNECT,
+ (struct sockaddr *)daddr,
+ af->sockaddr_len);
+ if (err < 0)
+ return err;
+
asoc = sctp_association_new(ep, sk, scope, GFP_KERNEL);
if (!asoc)
return -ENOMEM;
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:18 +0000 (23:05 +0800)]
sctp: adjust some codes in a better order in sctp_sendmsg
sctp_sendmsg_new_asoc and SCTP_ADDR_OVER check is only necessary
when daddr is set, so move them up to if (daddr) statement.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:17 +0000 (23:05 +0800)]
sctp: improve some variables in sctp_sendmsg
This patch mostly is to:
- rename sinfo_flags as sflags, to make the indents look better, and
also keep consistent with other sctp_sendmsg_xx functions.
- replace new_asoc with bool new, no need to define a pointer here,
as if new_asoc is set, it must be asoc.
- rename the 'out_nounlock:' as 'out', shorter and nicer.
- remove associd, only one place is using it now, just use
sinfo->sinfo_assoc_id directly.
- remove 'cmsgs' initialization in sctp_sendmsg, as it will be done
in sctp_sendmsg_parse.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:16 +0000 (23:05 +0800)]
sctp: remove the unnecessary transport looking up from sctp_sendmsg
Now sctp_assoc_lookup_paddr can only be called only if daddr has
been set. But if daddr has been set, sctp_endpoint_lookup_assoc
would be done, where it could already have the transport.
So this unnecessary transport looking up should be removed, but
only reset transport as NULL when SCTP_ADDR_OVER is not set for
UDP type socket.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:15 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_update_sinfo from sctp_sendmsg
This patch is to move the codes for trying to get sinfo from
asoc into sctp_sendmsg_update_sinfo.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:14 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_parse from sctp_sendmsg
This patch is to move the codes for parsing msghdr and checking
sk into sctp_sendmsg_parse.
Note that different from before, 'sinfo' in sctp_sendmsg won't
be NULL any more. It gets the value either from cmsgs->srinfo,
cmsgs->sinfo or asoc. With it, the 'sinfo' and 'fill_sinfo_ttl'
check can be removed from sctp_sendmsg.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:13 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_get_daddr from sctp_sendmsg
This patch is to move the codes for trying to get daddr from
msg->msg_name into sctp_sendmsg_get_daddr.
Note that after adding 'daddr', 'to' and 'msg_name' can be
deleted.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:12 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_check_sflags from sctp_sendmsg
This patch is to move the codes for checking sinfo_flags on one asoc
after this asoc has been found into sctp_sendmsg_check_sflags.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:11 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_new_asoc from sctp_sendmsg
This patch is to move the codes for creating a new asoc if
no asoc was found into sctp_sendmsg_new_asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 1 Mar 2018 15:05:10 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
This patch is to move the codes for checking and sending on
one asoc after this asoc has been found or created into
sctp_sendmsg_to_asoc.
Note that 'err != -ESRCH' check is for the case that asoc is
freed when waiting for tx buffer in sctp_sendmsg_to_asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Mar 2018 02:53:11 +0000 (21:53 -0500)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-03-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Extend bpftool to build up CFG information of eBPF programs and add an
option to dump this in DOT format such that this can later be used with
DOT graphic tools (xdot, graphviz, etc) to visualize it. Part of the
analysis performed is sub-program detection and basic-block partitioning,
from Jiong.
2) Multiple enhancements for bpftool's batch mode, more specifically the
parser now understands comments (#), continuation lines (\), and arguments
enclosed between quotes. Also, allow to read from stdin via '-' as input
file, all from Quentin.
3) Improve BPF kselftests by i) unifying the rlimit handling into a helper
that is then used by all tests, and ii) add support for testing tail calls
to test_verifier plus add tests covering all corner cases. The latter is
especially useful for testing JITs, from Daniel.
4) Remove x64 JIT's bpf_flush_icache() since flush_icache_range() is a noop
on x64, from Daniel.
5) Fix one more occasion in BPF samples where we do not detach the BPF program
from the cgroup after completion, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 2 Mar 2018 13:42:39 +0000 (13:42 +0000)]
net/usb/kalmia: use ARRAY_SIZE for various array sizing calculations
Use the ARRAY_SIZE macro on a couple of arrays to determine
size of the arrays. Also fix up alignment to clean up a checkpatch
warning. Improvement suggested by Coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 2 Mar 2018 10:27:07 +0000 (15:57 +0530)]
cxgb4: Add TP Congestion map entry for single-port
Add TP Congestion Map entry for single-port T6 cards.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 14:50:21 +0000 (09:50 -0500)]
Merge tag 'mac80211-next-for-davem-2018-03-02' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Only a few new things:
* hwsim net namespace stuff from Kirill Tkhai
* A-MSDU support in fast-RX
* 4-addr mode support in fast-RX
* support for a spec quirk in Add-BA negotiation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 2 Mar 2018 09:05:49 +0000 (14:35 +0530)]
cxgb4: remove dead code when allocating filter
Error code is already returned earlier if filter exists
at specified location. So, remove dead code trying to
free existing filter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Thu, 1 Mar 2018 11:30:17 +0000 (14:30 +0300)]
net: Convert hwsim_net_ops
These pernet_operations allocate and destroy IDA identifier,
and these actions are synchronized by IDA subsystem locks.
Exit method removes mac80211_hwsim_data enteries from the lists,
and this is synchronized by hwsim_radio_lock with the rest
parallel pernet_operations. Also it queues destroy_radio()
work, and these work already may be executed in parallel
with any pernet_operations (as it's a work :). So, we may
mark these pernet_operations as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Kirill Tkhai [Thu, 1 Mar 2018 11:30:09 +0000 (14:30 +0300)]
mac80211_hwsim: Make hwsim_netgroup IDA
hwsim_netgroup counter is declarated as int, and it is incremented
every time a new net is created. After sizeof(int) net are created,
it will overflow, and different net namespaces will have the same
identifier. This patch fixes the problem by introducing IDA instead
of int counter. IDA guarantees, all the net namespaces have the uniq
identifier.
Note, that after we do ida_simple_remove() in hwsim_exit_net(),
and we destroy the ID, later there may be executed destroy_radio()
from the workqueue. But destroy_radio() does not use the ID, so it's OK.
Out of bounds of this patch, just as a report to wireless subsystem
maintainer, destroy_radio() increaments hwsim_radios_generation
without hwsim_radio_lock, so this may need one more patch to fix.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Daniel Borkmann [Fri, 2 Mar 2018 08:46:41 +0000 (09:46 +0100)]
Merge branch 'bpf-bpftool-batch-improvements'
Quentin Monnet says:
====================
Several enhancements for bpftool batch mode are introduced in this series.
More specifically, input files for batch mode gain support for:
* comments (starting with '#'),
* continuation lines (after a line ending with '\'),
* arguments enclosed between quotes.
Also, make bpftool able to read from standard input when "-" is provided as
input file name.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:11 +0000 (20:20 -0800)]
tools: bpftool: add support for quotations in batch files
Improve argument parsing from batch input files in order to support
arguments enclosed between single (') or double quotes ("). For example,
this command can now be parsed in batch mode:
bpftool prog dump xlated id 1337 file "/tmp/my file with spaces"
The function responsible for parsing command arguments is copied from
its counterpart in lib/utils.c in iproute2 package.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:10 +0000 (20:20 -0800)]
tools: bpftool: read from stdin when batch file name is "-"
Make bpftool read its command list from standard input when the name if
the input file is a single dash.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:09 +0000 (20:20 -0800)]
tools: bpftool: support continuation lines in batch files
Add support for continuation lines, such as in the following example:
prog show
prog dump xlated \
id 1337 opcodes
This patch is based after the code for support for continuation lines
from file lib/utils.c from package iproute2.
"Lines" in error messages are renamed as "commands", as we count the
number of commands (but we ignore empty lines, comments, and do not add
continuation lines to the count).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Quentin Monnet [Fri, 2 Mar 2018 04:20:08 +0000 (20:20 -0800)]
tools: bpftool: support comments in batch files
Replace '#' by '\0' in commands read from batch files in order to avoid
processing the remaining part of the line, thus allowing users to use
comments in the files.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Fri, 2 Mar 2018 02:44:29 +0000 (21:44 -0500)]
Merge branch 'tcp_bbr-more-GSO-work'
Eric Dumazet says:
====================
tcp_bbr: more GSO work
Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots, I found
that BBR was performing poorly, because of TSO being limited to 16KB
This patch series makes sure BBR is not under estimating number of
packets that are needed to fill the pipe when a device has suboptimal
TSO limits.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 28 Feb 2018 22:40:47 +0000 (14:40 -0800)]
tcp_bbr: remove bbr->tso_segs_goal
Its value is computed then immediately used,
there is no need to store it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 28 Feb 2018 22:40:46 +0000 (14:40 -0800)]
tcp_bbr: better deal with suboptimal GSO (II)
This is second part of dealing with suboptimal device gso parameters.
In first patch (
350c9f484bde "tcp_bbr: better deal with suboptimal GSO")
we dealt with devices having low gso_max_segs
Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example)
In order to probe an optimal cwnd, we want BBR being not sensitive
to whatever GSO constraint a device can have.
This patch removes tso_segs_goal() CC callback in favor of
min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs
Next patch will remove bbr->tso_segs_goal since it does not have
to be persistent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Fri, 2 Mar 2018 02:29:50 +0000 (18:29 -0800)]
Merge branch 'bpftool-visualization'
Jakub Kicinski says:
====================
Jiong says:
This patch set is an application of CFG information on eBPF program
visualization. It presents some initial code for building CFG information
from eBPF instruction sequences.
After we get eBPF program bytecode, we do sub-program detection and
basic-block partition. These information then are visualized into DOT
graph.
The user could use any DOT graphic tools (xdot, graphviz etc) to view it.
For example:
bpftool prog dump xlated id 2 visual &>output.dot
[xdot | dotty] output.dot
dot -Tpng -o output.png
This initial patch set hasn't tuned much on the dot description layout
nor decoration, we could improve them later once the direction of the patch
set is agreed on. We could also visualize some static analysis performance
data.
v2 (Jakub):
- update license headers and add SPDX tags.
====================
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Quentin Monnet [Fri, 2 Mar 2018 02:01:23 +0000 (18:01 -0800)]
tools: bpftool: add bash completion for CFG dump
Add bash completion for the "visual" keyword used for dumping the CFG of
eBPF programs with bpftool. Make sure we only complete with this keyword
when we dump "xlated" (and not "jited") instructions.
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:22 +0000 (18:01 -0800)]
tools: bpftool: new command-line option and documentation for 'visual'
This patch adds new command-line option for visualizing the xlated eBPF
sequence.
Documentations are updated accordingly.
Usage:
bpftool prog dump xlated id 2 visual
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:21 +0000 (18:01 -0800)]
tools: bpftool: generate .dot graph from CFG information
This patch let bpftool print .dot graph file into stdout.
This graph is generated by the following steps:
- iterate through the function list.
- generate basic-block(BB) definition for each BB in the function.
- draw out edges to connect BBs.
This patch is the initial support, the layout and decoration of the .dot
graph could be improved.
Also, it will be useful if we could visualize some performance data from
static analysis.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:20 +0000 (18:01 -0800)]
tools: bpftool: add out edges for each basic-block
This patch adds out edges for each basic-block. We will need these out
edges to finish the .dot graph drawing.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:19 +0000 (18:01 -0800)]
tools: bpftool: partition basic-block for each function in the CFG
This patch partition basic-block for each function in the CFG. The
algorithm is simple, we identify basic-block head in a first traversal,
then second traversal to identify the tail.
We could build extended basic-block (EBB) in next steps. EBB could make the
graph more readable when the eBPF sequence is big.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:18 +0000 (18:01 -0800)]
tools: bpftool: detect sub-programs from the eBPF sequence
This patch detect all sub-programs from the eBPF sequence and keep the
information in the new CFG data structure.
The detection algorithm is basically the same as the one in verifier except
we need to use insn->off instead of insn->imm to get the pc-relative call
offset. Because verifier has modified insn->off/insn->imm during finishing
the verification.
Also, we don't need to do some sanity checks as verifier has done them.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:17 +0000 (18:01 -0800)]
tools: bpftool: factor out xlated dump related code into separate file
This patch factors out those code of dumping xlated eBPF instructions into
xlated_dumper.[h|c].
They are quite independent dumper functions, so better to be kept
separately.
New dumper support will be added in later patches in this set.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiong Wang [Fri, 2 Mar 2018 02:01:16 +0000 (18:01 -0800)]
tools: bpftool: remove unnecessary 'if' to reduce indentation
It is obvious we could use 'else if' instead of start a new 'if' in the
touched code.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Soheil Hassas Yeganeh [Tue, 27 Feb 2018 23:22:40 +0000 (18:22 -0500)]
socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)
recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error queue using recvmmsg.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 02:23:42 +0000 (21:23 -0500)]
Merge branch 'net-phy-Reduce-duplication'
Florian Fainelli says:
====================
net: phy: Reduce duplication
This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.
Changes in v3:
- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()
Changes in v2:
- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
what it does (or does not)
- removed stray comment in marvell10g.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 2 Mar 2018 00:08:59 +0000 (16:08 -0800)]
net: phy: marvell10g: Utilize gen10g_no_soft_reset()
We do the same thing as the generic function: nothing, so utilize it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli [Fri, 2 Mar 2018 00:08:58 +0000 (16:08 -0800)]
net: phy: cortina: Utilize generic functions
cortina_soft_reset() does the same thing as gen10g_soft_reset(), and
cortina_config_aneg() is actually doing what gen10g_config_init() does
for 10G capable PHYs.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli [Fri, 2 Mar 2018 00:08:57 +0000 (16:08 -0800)]
net: phy: teranetics: Utilize generic functions
Update teranetics_aneg_done() to use genphy_c45_aneg_done() instead of
duplicating that code, and switch to gen10g_* functions where
appropriate instead of maintaining identical copies doing nothing.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli [Fri, 2 Mar 2018 00:08:56 +0000 (16:08 -0800)]
net: phy: Export gen10g_* functions
In order to remove a fair amount of duplication in the different 10G PHY
drivers, export all gen10g_* functions to be able to make use of those.
While we are at it, rename gen10g_soft_reset() to gen10g_no_soft_reset()
to illustrate what it does.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Florian Fainelli [Fri, 2 Mar 2018 00:08:55 +0000 (16:08 -0800)]
net: phy: aquantia: Utilize genphy_c45_aneg_done()
The driver duplicates what the generic function does, so use the generic
function intead.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
David S. Miller [Fri, 2 Mar 2018 02:21:36 +0000 (21:21 -0500)]
Merge branch 'mac89x0-fixes-and-cleanups'
Finn Thain says:
====================
Fixes, cleanup and modernization for mac89x0 driver
Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Added acked-by tags from Geert Uytterhoeven.
- Omitted patches unrelated to mac89x0 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Replace custom debug logging with netif_* calls
Adopt the conventional style of debug logging because it is both
shorter and more flexible.
Remove the 'version_printed' flag as the version will be printed
only once anyway (when the module loads).
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Fix and modernize log messages
Fix log message fragments that no longer produce the desired output
since the behaviour of printk() was changed.
Add missing printk severity levels.
Drop deprecated "out of memory" message as per checkpatch advice.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Convert to platform_driver
Apparently these Dayna cards don't have a pseudoslot declaration ROM
which means they can't be probed like NuBus cards.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Remove redundant code
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Mar 2018 02:19:03 +0000 (21:19 -0500)]
Merge branch 'forwarding-selftest-fixes'
David Ahern says:
====================
selftests: forwarding: misc bug fixes and enhancements
Bug fixes and an enhancement for the recent forwarding tests:
- only check tc version on tc tests
- handle multipath tests failing with 0 packet count
- fix ping command for IPv6 on Debian jessie
- improve summary of multipath tests
v2
- add CHECK_TC to bridge_vlan_aware.sh (Ido)
- dropped patch 2; always check for mz given its use
- fixed commit message for the last patch (Multipath: was dropped)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Mar 2018 21:49:33 +0000 (13:49 -0800)]
selftests: forwarding: Add description to the multipath tests
Add a better description to the summary for multipath tests. e.g.,
INFO: Running IPv6 multipath tests
TEST: ECMP [PASS]
INFO: Expected ratio 1.00 Measured ratio 1.02
TEST: Weighted MP 2:1 [PASS]
INFO: Expected ratio 2.00 Measured ratio 2.02
TEST: Weighted MP 11:45 [PASS]
INFO: Expected ratio 4.09 Measured ratio 4.03
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>