Andrii [Thu, 31 Aug 2017 05:28:01 +0000 (08:28 +0300)]
net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()
Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c,
similar
to the handling in net/ipv6/tcp_ipv6.c
Signed-off-by: Andrii Vladyka <tulup@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 31 Aug 2017 05:18:13 +0000 (22:18 -0700)]
bridge: add tracepoint in br_fdb_update
This extends bridge fdb table tracepoints to also cover
learned fdb entries in the br_fdb_update path. Note that
unlike other tracepoints I have moved this to when the fdb
is modified because this is in the datapath and can generate
a lot of noise in the trace output. br_fdb_update is also called
from added_by_user context in the NTF_USE case which is already
traced ..hence the !added_by_user check.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 30 Aug 2017 21:30:36 +0000 (14:30 -0700)]
net_sched: add reverse binding for tc class
TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:
1. If we add tc classes before its filters, everything is fine.
Logically, the classes exist before we specify their ID's in
filters, it is easy to bind them together, just as in the current
code base.
2. If we add tc filters before the tc classes they bind, we have to
do dynamic lookup in fast path. What's worse, this happens all
the time not just once, because on fast path tcf_result is passed
on stack, there is no way to propagate back to the one in tc filters.
This hidden difference hurts performance silently if we have many tc
classes in hierarchy.
This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.
Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Aug 2017 05:14:37 +0000 (22:14 -0700)]
Merge tag 'mlx5-GRE-Offload' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-08-31 (GRE Offloads support)
This series provides the support for MPLS RSS and GRE TX offloads and
RSS support.
The first patch from Gal and Ariel provides the mlx5 driver support for
ConnectX capability to perform IP version identification and matching in
order to distinguish between IPv4 and IPv6 without the need to specify the
encapsulation type, thus perform RSS in MPLS automatically without
specifying MPLS ethertyoe. This patch will also serve for inner GRE IPv4/6
classification for inner GRE RSS.
2nd patch from Gal, Adds the TX offloads support for GRE tunneled packets,
by reporting the needed netdev features.
3rd patch from Gal, Adds GRE inner RSS support by creating the needed device
resources (Steering Tables/rules and traffic classifiers) to Match GRE traffic
and perform RSS hashing on the inner headers.
Improvement:
Testing 8 TCP streams bandwidth over GRE:
System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 21.3 Gbps (Single RQ)
Now : 90.5 Gbps (RSS spread on 8 RQs)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Farrington [Wed, 30 Aug 2017 23:19:53 +0000 (16:19 -0700)]
liquidio: fix crash in presence of zeroed-out base address regs
Fix crash in linux PF driver when BARs have been cleared/de-programmed;
fail early init (prior to mapping BARs) if the BAR0 or
BAR1 registers are zero.
This situation can arise when the PF is added to a VM (PCI pass-through),
then a PF FLR is issued (in the VM). After this occurs, the BAR registers
will be zero. If we attempt to load the PF driver in the host
(after VM has been shutdown), the host can reset.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 31 Aug 2017 00:07:30 +0000 (17:07 -0700)]
devlink: Maintain consistency in mac field name
IPv4 name uses "destination ip" as does the IPv6 patch set.
Make the mac field consistent.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Wed, 30 Aug 2017 20:37:22 +0000 (13:37 -0700)]
hv_netvsc: Fix typos in the document of UDP hashing
There are two typos in the document, netvsc.txt,
regarding UDP hashing level. This patch fixes them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Aug 2017 17:32:58 +0000 (10:32 -0700)]
xen-netfront: be more drop monitor friendly
xennet_start_xmit() might copy skb with inappropriate layout
into a fresh one.
Old skb is freed, and at this point it is not a drop, but
a consume. New skb will then be either consumed or dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 13 Aug 2017 13:22:38 +0000 (16:22 +0300)]
net/mlx5e: Support RSS for GRE tunneled packets
Introduce a new flow table and indirect TIRs which are used to hash the
inner packet headers of GRE tunneled packets.
When a GRE tunneled packet is received, the TTC flow table will match
the new IPv4/6->GRE rules which will forward it to the inner TTC table.
The inner TTC is similar to its counterpart outer TTC table, but
matching the inner packet headers instead of the outer ones (and does
not include the new IPv4/6->GRE rules).
The new rules will not add steering hops since they are added to an
already existing flow group which will be matched regardless of this
patch. Non GRE traffic will not be affected.
The inner flow table will forward the packet to inner indirect TIRs
which hash the inner packet and thus result in RSS for the tunneled
packets.
Testing 8 TCP streams bandwidth over GRE:
System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 21.3 Gbps (Single RQ)
Now : 90.5 Gbps (RSS spread on 8 RQs)
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Sun, 13 Aug 2017 10:34:42 +0000 (13:34 +0300)]
net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels
Add TX offloads support for GRE tunneled packets by reporting the needed
netdev features.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Tue, 15 Aug 2017 11:18:08 +0000 (14:18 +0300)]
net/mlx5e: Use IP version matching to classify IP traffic
This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classification steering rules.
Since IP packets with MPLS tag header have MPLS ethertype, they
missed the IPv4/6 ethertype rule and ended up hitting the default
filter forwarding all the packets to the same single RQ (No RSS).
Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces ethertype
matching by the device's capability to perform IP version
identification and matching in order to distinguish between IPv4 and
IPv6.
Therefore, when driver is performing flow steering configuration on the
device it will use IP version matching in IP classified rules instead
of ethertype matching which will cause relevant MPLS tagged packets to
hit this rule as well.
If the device doesn't support IP version matching the driver will fall back
to use legacy ethertype matching in the steering as before.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Colin Ian King [Wed, 30 Aug 2017 17:15:25 +0000 (18:15 +0100)]
bpf: test_maps: fix typos, "conenct" and "listeen"
Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"
thanks to Daniel Borkmann for spotting one of these mistakes
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 30 Aug 2017 11:40:12 +0000 (12:40 +0100)]
qed: fix spelling mistake: "calescing" -> "coalescing"
Trivial fix to spelling mistake in DP_NOTICE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Wed, 30 Aug 2017 11:06:03 +0000 (12:06 +0100)]
net: hns3: Fixes the wrong IS_ERR check on the returned phydev value
This patch removes the wrong check being done for the phy device being
returned by the mdiobus_get_phy() function. This function never returns
the error pointers.
Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet
Driver for hip08 SoC")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhumika Goyal [Wed, 30 Aug 2017 09:25:08 +0000 (14:55 +0530)]
net: bcm63xx_enet: make bcm_enetsw_ethtool_ops const
Make this const as it is never modified.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ahmed Abdelsalam [Wed, 30 Aug 2017 08:50:37 +0000 (10:50 +0200)]
ipv6: sr: fix get_srh() to comply with IPv6 standard "RFC 8200"
IPv6 packet may carry more than one extension header, and IPv6 nodes must
accept and attempt to process extension headers in any order and occurring
any number of times in the same packet. Hence, there should be no
assumption that Segment Routing extension header is to appear immediately
after the IPv6 header.
Moreover, section 4.1 of RFC 8200 gives a recommendation on the order of
appearance of those extension headers within an IPv6 packet. According to
this recommendation, Segment Routing extension header should appear after
Hop-by-Hop and Destination Options headers (if they present).
This patch fixes the get_srh(), so it gets the segment routing header
regardless of its position in the chain of the extension headers in IPv6
packet, and makes sure that the IPv6 routing extension header is of Type 4.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Aug 2017 22:17:46 +0000 (15:17 -0700)]
Merge branch 'mvpp2-comphy'
Antoine Tenart says:
====================
net: mvpp2: comphy configuration
This series, following up the one one the GoP/MAC configuration, aims at
stopping to depend on the firmware/bootloader configuration when using
the PPv2 engine. With this series the PPv2 driver does not need to rely
on a previous configuration, and dynamic reconfiguration while the
kernel is running can be done (i.e. switch one port from SGMII to 10G,
or the opposite). A port can now be configured in a different mode than
what's done in the firmware/bootloader as well.
The series first contain patches in the generic PHY framework to support
what is called the comphy (common PHYs), which is an h/w block providing
PHYs that can be configured in various modes ranging from SGMII, 10G
to SATA and others. As of now only the SGMII and 10G modes are
supported by the comphy driver.
Then patches are modifying the PPv2 driver to first add the comphy
initialization sequence (i.e. calls to the generic PHY framework) and to
then take advantage of this to allow dynamic reconfiguration (i.e.
configuring the mode of a port given what's connected, between sgmii and
10G). Note the use of the comphy in the PPv2 driver is kept optional
(i.e. if not described in dt the driver still as before an relies on the
firmware/bootloader configuration).
Finally there are dt/defconfig patches to describe and take advantage of
this.
This was tested on a range of devices: 8040-db, 8040-mcbin and 7040-db.
@Dave: the dt patches should go through the mvebu tree (patches 9-13).
Thanks!
Antoine
Since v3:
- Now use of_phy_simple_xlate() to retrieve the phy.
- Added an owner in the phy_ops structure.
- Now allow the module to be selected with COMPILE_TEST.
- Removed unused parameter in the comphy set_mode functions.
- Added Kishon Acked-by in patch 1.
Since v2:
- Kept the link mode enforcement.
- Removed the netif_running() check.
- Reworded the "dynamic reconfiguration of the PHY mode" commit log.
- Added one patch not to force the GMAC autoneg parameters when using
the XLG MAC.
Since v1:
- Updated the mode settings variable name in the comphy driver to
have 'cp110' in it.
- Documented the PHY cell argument in the dt documentation.
- New patch adding comphy phandles for the 7040-db board.
- Checked if the carrier_on/off functions were needed. They are.
- s/PHY/generic PHY/ in commit log of patch 1.
- Rebased on the latest net-next/master.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:19 +0000 (10:29 +0200)]
net: mvpp2: dynamic reconfiguration of the comphy/GoP/MAC
This patch adds logic to reconfigure the comphy/GoP/MAC when the link
state is updated at runtime. This is very useful on boards where many
link speed are supported: depending on what is negotiated the PPv2
driver will automatically reconfigures the link between the PHY and the
MAC.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:18 +0000 (10:29 +0200)]
net: mvpp2: do not set GMAC autoneg when using XLG MAC
When using the XLG MAC, it does not make sense to force the GMAC autoneg
parameters. This patch adds checks to only set the GMAC autoneg
parameters when needed (i.e. when not using the XLG MAC).
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:17 +0000 (10:29 +0200)]
net: mvpp2: improve the link management function
When the link status changes, the phylib calls the link_event function
in the mvpp2 driver. Before this patch only the egress/ingress transmit
was enabled/disabled. This patch adds more functionality to the link
status management code by enabling/disabling the port per-cpu
interrupts, and the port itself. The queues are now stopped as well, and
the netif carrier helpers are called.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:16 +0000 (10:29 +0200)]
net: mvpp2: simplify the link_event function
The link_event function is somewhat complicated. This cosmetic patch
simplifies it.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:15 +0000 (10:29 +0200)]
net: mvpp2: initialize the comphy
On some platforms, the comphy is between the MAC GoP and the PHYs. The
mvpp2 driver currently relies on the firmware/bootloader to configure
the comphy. As a comphy driver was added to the generic PHY framework,
this patch uses it in the mvpp2 driver to configure the comphy at boot
time to avoid relying on the bootloader.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:14 +0000 (10:29 +0200)]
Documentation/bindings: phy: document the Marvell comphy driver
The Marvell Armada 7K/8K SoCs contains an hardware block called COMPHY
that provides a number of shared PHYs used by various interfaces in the
SoC: network, SATA, PCIe, etc. This Device Tree binding allows to
describe this COMPHY hardware block.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:13 +0000 (10:29 +0200)]
phy: add the mvebu cp110 comphy driver
On the CP110 unit, which can be found on various Marvell platforms such
as the 7k and 8k (currently), a comphy (common PHYs) hardware block can
be found. This block provides a number of PHYs which can be used in
various modes by other controllers (network, SATA ...). These common
PHYs must be configured for the controllers using them to work correctly
either at boot time, or when the system runs to switch the mode used.
This patch adds a driver for this comphy hardware block, providing
callbacks for the its PHYs so that consumers can configure the modes
used.
As of this commit, two modes are supported by the comphy driver: sgmii
and 10gkr.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 30 Aug 2017 08:29:12 +0000 (10:29 +0200)]
phy: add sgmii and 10gkr modes to the phy_mode enum
This patch adds more generic PHY modes to the phy_mode enum, to
allow configuring generic PHYs to the SGMII and/or the 10GKR mode
by using the set_mode callback.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Sørensen [Wed, 30 Aug 2017 06:58:47 +0000 (08:58 +0200)]
dp83640: don't hold spinlock while calling netif_rx_ni
We should not hold a spinlock while pushing the skb into the networking
stack, so move the call to netif_rx_ni out of the critical region to where
we have dropped the spinlock.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Aug 2017 21:38:59 +0000 (14:38 -0700)]
Merge branch 'net_sched-idr'
Chris Mi says:
====================
net/sched: Improve getting objects by indexes
Using current TC code, it is very slow to insert a lot of rules.
In order to improve the rules update rate in TC,
we introduced the following two changes:
1) changed cls_flower to use IDR to manage the filters.
2) changed all act_xxx modules to use IDR instead of
a small hash table
But IDR has a limitation that it uses int. TC handle uses u32.
To make sure there is no regression, we add several new IDR APIs
to support unsigned long.
v2
==
Addressed Hannes's comment:
express idr_alloc in terms of idr_alloc_ext and most of the other functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mi [Wed, 30 Aug 2017 06:31:59 +0000 (02:31 -0400)]
net/sched: Change act_api and act_xxx modules to use IDR
Typically, each TC filter has its own action. All the actions of the
same type are saved in its hash table. But the hash buckets are too
small that it degrades to a list. And the performance is greatly
affected. For example, it takes about 0m11.914s to insert 64K rules.
If we convert the hash table to IDR, it only takes about 0m1.500s.
The improvement is huge.
But please note that the test result is based on previous patch that
cls_flower uses IDR.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mi [Wed, 30 Aug 2017 06:31:58 +0000 (02:31 -0400)]
net/sched: Change cls_flower to use IDR
Currently, all filters with the same priority are linked in a doubly
linked list. Every filter should have a unique handle. To make the
handle unique, we need to iterate the list every time to see if the
handle exists or not when inserting a new filter. It is time-consuming.
For example, it takes about 5m3.169s to insert 64K rules.
This patch changes cls_flower to use IDR. With this patch, it
takes about 0m1.127s to insert 64K rules. The improvement is huge.
But please note that in this testing, all filters share the same action.
If every filter has a unique action, that is another bottleneck.
Follow-up patch in this patchset addresses that.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mi [Wed, 30 Aug 2017 06:31:57 +0000 (02:31 -0400)]
idr: Add new APIs to support unsigned long
The following new APIs are added:
int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
unsigned long start, unsigned long end, gfp_t gfp);
void *idr_remove_ext(struct idr *idr, unsigned long id);
void *idr_find_ext(const struct idr *idr, unsigned long id);
void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Aug 2017 18:41:14 +0000 (11:41 -0700)]
Merge branch 'add-rmnet-driver'
Subash Abhinov Kasiviswanathan says:
====================
net: Add support for rmnet driver
This patch series adds support for the rmnet driver which is required to
support recent chipsets using Qualcomm Technologies, Inc. modems. The data
from hardware follows the multiplexing and aggregation protocol (MAP).
This driver can be used to register onto any physical network device in
IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.
rmnet driver helps to decode these packets and queue them to network
stack (and encode and transmit it to the physical device).
v1: Same as the RFC patch with some minor fixes for issues reported by
kbuild test robot.
v1->v2: Change datatypes and remove config IOCTL as mentioned by David.
Also fix checkpatch issues and remove some unused code.
v2->v3: Move location to drivers/net and rename to rmnet. Change the
userspace - netlink communication from custom netlink to rtnl_link_ops.
Refactor some code. Use a fixed config for ingress and egress.
v3->v4: Move location to drivers/net/ethernet/qualcomm/.
Fix comments from Stephen and Jiri -
Split the ether and arp type changes into seperate patches.
Remove debug and custom logging and switch to standard netdevice log.
Remove module parameters. Refactor and change some code style issues.
v4->v5: Rename some structs and variables. Move the initializer
before the for loop start. Put the arp type in correct sequence.
v5->v6: Fix comments from Dan -
Use the upper link API. As a result, remove all the refcounting logic.
Device refcount is explicitly held on real_dev on rx_handler
registration only. Modifiy the flow control struct. Remove the unused
ethernet mode handling.
v6->v7: Fix comments from David - Add newline to end of Makefile. Remove
inline from .c files. Move the module init/exit to rmnet config. Fix an
error reported by kbuild test robot for an unused file.
v7->v8: Use a smaller value for ETH_P_MAP as mentioned by David. Change
netdev_info to netdev_dbg as mentioned by Andew. Fix comments from
Stephen regarding netdev_priv and sparse related errors of using 0 as NULL
v8->v9: Fix comments from David - Remove the CFLAG rule. Change the way
rmnet devices are freed. Instead of using a workqueue to unregister devices
individually, go through the list and free all devices within the rtnl_lock().
v9->v10: Actually fix the locking as mentioned by David. The locking scheme is
mentioned in a comment in rmnet_config.c. Change comment near MAP type
definition as mentioned by Dan. Refactor some code.
v10->v11: Allow RMNET to compile as a module as mentioned by David
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Wed, 30 Aug 2017 04:44:18 +0000 (22:44 -0600)]
drivers: net: ethernet: qualcomm: rmnet: Initial implementation
RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Wed, 30 Aug 2017 04:44:17 +0000 (22:44 -0600)]
net: arp: Add support for raw IP device
Define the raw IP type. This is needed for raw IP net devices
like rmnet.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Wed, 30 Aug 2017 04:44:16 +0000 (22:44 -0600)]
net: ether: Add support for multiplexing and aggregation type
Define the Qualcomm multiplexing and aggregation (MAP) ether type 0x00F9.
This is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Aug 2017 18:20:09 +0000 (11:20 -0700)]
Merge branch 'tcp-readd-hp'
Florian Westphal says:
====================
tcp: re-add header prediction
Eric reported a performance regression caused by header prediction
removal.
We now call tcp_ack() much more frequently, for some workloads
this brings in enough cache line misses to become noticeable.
We could possibly still kill HP provided we find a different
way to suppress unneeded tcp_ack, but given we're late in
the cycle it seems preferable to revert.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 30 Aug 2017 17:24:58 +0000 (19:24 +0200)]
tcp: Revert "tcp: remove header prediction"
This reverts commit
45f119bf936b1f9f546a0b139c5b56f9bb2bdc78.
Eric Dumazet says:
We found at Google a significant regression caused by
45f119bf936b1f9f546a0b139c5b56f9bb2bdc78 tcp: remove header prediction
In typical RPC (TCP_RR), when a TCP socket receives data, we now call
tcp_ack() while we used to not call it.
This touches enough cache lines to cause a slowdown.
so problem does not seem to be HP removal itself but the tcp_ack()
call. Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 30 Aug 2017 17:24:57 +0000 (19:24 +0200)]
tcp: Revert "tcp: remove CA_ACK_SLOWPATH"
This change was a followup to the header prediction removal,
so first revert this as a prerequisite to back out hp removal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg KH [Wed, 30 Aug 2017 11:16:49 +0000 (13:16 +0200)]
staging: irda: fix init level for irda core
When moving the IRDA code out of net/ into drivers/staging/irda/net, the
link order changes when IRDA is built into the kernel. That causes a
kernel crash at boot time as netfilter isn't initialized yet.
To fix this, move the init call level of the irda core to be
device_initcall() as the link order keeps this being initialized at the
correct time.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 30 Aug 2017 04:48:51 +0000 (21:48 -0700)]
net: bcmgenet: Do not return from void function
A stray return was added in the macro bcmgenet_##name##_writel where it
should not, drop it.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 69d2ea9c7989 ("net: bcmgenet: Use correct I/O accessors")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 29 Aug 2017 22:16:01 +0000 (15:16 -0700)]
neigh: increase queue_len_bytes to match wmem_default
Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.
Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).
Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.
Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Jiang [Tue, 29 Aug 2017 20:17:51 +0000 (13:17 -0700)]
net: remove dmaengine.h inclusion from netdevice.h
Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed
by netdevice.h anymore.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 19:25:31 +0000 (12:25 -0700)]
net: bcmgenet: Use correct I/O accessors
The GENET driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weilin Chang [Tue, 29 Aug 2017 19:19:57 +0000 (12:19 -0700)]
liquidio: show NIC's U-Boot version in a dev_info() message
Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhumika Goyal [Tue, 29 Aug 2017 16:47:52 +0000 (22:17 +0530)]
net: dsa: make some structures const
Make these const as they are not modified anywhere.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 28 Aug 2017 20:53:34 +0000 (13:53 -0700)]
ipv6: Use rt6i_idev index for echo replies to a local address
Tariq repored local pings to linklocal address is failing:
$ ifconfig ens8
ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255
inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link>
ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet)
RX packets 12 bytes 1164 (1.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30 bytes 2484 (2.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Lendacky [Mon, 28 Aug 2017 20:29:34 +0000 (15:29 -0500)]
amd-xgbe: Interrupt summary bits are h/w version dependent
There is a difference in the bit position of the normal interrupt summary
enable (NIE) and abnormal interrupt summary enable (AIE) between revisions
of the hardware. For older revisions the NIE and AIE bits are positions
16 and 15 respectively. For newer revisions the NIE and AIE bits are
positions 15 and 14. The effect in changing the bit position is that
newer hardware won't receive AIE interrupts in the current version of the
driver. Specifically, the driver uses this interrupt to collect
statistics on when a receive buffer unavailable event occurs and to
restart the driver/device when a fatal bus error occurs.
Update the driver to set the interrupt enable bit based on the reported
version of the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 22:16:53 +0000 (15:16 -0700)]
Merge branch 'nsh-headers-GSO'
Jiri Benc says:
====================
nsh: headers, GSO
This adds header structs and helpers for NSH together with GSO support.
Note there is no code in this patchset that actually manipulates the NSH
headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3]
openvswitch: add NSH support"). The aim of this series is to lay the
groundwork and ease the implementation for him.
In addition to openvswitch, the NSH support should be added to tc (flower to
match, act_nsh to push/pop NSH headers). That will come later. There's
currently no plan to support NSH by other means than those two.
The patch 3 in this patchset was written by Yi Yang, I took it from the
aforementioned series and slightly modified it - see the note in the patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Mon, 28 Aug 2017 19:43:24 +0000 (21:43 +0200)]
nsh: add GSO support
Add a new nsh/ directory. It currently holds only GSO functions but more
will come: in particular, code shared by openvswitch and tc to manipulate
NSH headers.
For now, assume there's no hardware support for NSH segmentation. We can
always introduce netdev->nsh_features later.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yi Yang [Mon, 28 Aug 2017 19:43:23 +0000 (21:43 +0200)]
net: add NSH header structures and helpers
NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.
This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.
[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
[Jiri: added nsh_hdr() helper and renamed the header struct to "struct
nshhdr" to match the usual pattern. Removed packet type defines, these are
now shared with VXLAN-GPE.]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Mon, 28 Aug 2017 19:43:22 +0000 (21:43 +0200)]
vxlan: factor out VXLAN-GPE next protocol
The values are shared between VXLAN-GPE and NSH. Originally probably by
coincidence but I notified both working groups about this last year and they
seem to keep the values in sync since then.
Hopefully they'll get a single IANA registry for the values, too. (I asked
them for that.)
Factor out the code to be shared by the NSH implementation.
NSH and MPLS values are added in this patch, too. For MPLS, the drafts
incorrectly assign only a single value, while we have two MPLS ethertypes.
I raised the problem with both groups. For now, I assume the value is for
unicast.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Mon, 28 Aug 2017 19:43:21 +0000 (21:43 +0200)]
ether: add NSH ethertype
The NSH draft says:
An IEEE EtherType, 0x894F, has been allocated for NSH.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 22:14:19 +0000 (15:14 -0700)]
Merge branch 'ife-ethertype'
Alexander Aring says:
====================
tc: act_ife: handle IEEE IFE ethertype as default
this patch series will introduce the IFE ethertype which is registered by
IEEE. If the netlink act_ife type netlink attribute is not given it will
use this value by default now.
At least it will introduce some UAPI testcases to check if the default type
is used if not specified and vice versa.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 28 Aug 2017 19:03:15 +0000 (15:03 -0400)]
tc-testing: add test for testing ife type
This patch adds a new testcase for the IFE type setting in tc. In case
of user specified the type it will check if the ife is correctly
configured to react on it. If it's not specified the default IFE type
should be used.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 28 Aug 2017 19:03:14 +0000 (15:03 -0400)]
act_ife: use registered ife_type as fallback
This patch handles a default IFE type if it's not given by user space
netlink api. The default IFE type will be the registered ethertype by
IEEE for IFE ForCES.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 28 Aug 2017 19:03:13 +0000 (15:03 -0400)]
if_ether: add forces ife lfb type
This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.
This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 22:07:51 +0000 (15:07 -0700)]
Documentation: networking: Add blurb about patches in patchwork
Explain that the patch queue in patchwork should not be touched by patch
submitters.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 21:58:33 +0000 (14:58 -0700)]
Merge branch 'mlx4-misc-patches'
Tariq Toukan says:
====================
mlx4 misc patches
This patchset contains misc patches from the team
to the mlx4 Core and Eth drivers.
Patch 1 by Eran replaces large static allocations by dynamic ones.
Patch 2 by Leon makes an explicit conversion and solves a smatch warning.
In patch 3 I fix a misplaced brackets of the sizeof operation.
Patch 4 by Moshe adds the ability to inform the FW regarding user mac updates.
Series generated against net-next commit:
901c5d2fbfcd ARM: dts: rk3228-evb: Fix the compiling error
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moshe Shemesh [Mon, 28 Aug 2017 13:38:23 +0000 (16:38 +0300)]
net/mlx4: Add user mac FW update support
Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Mon, 28 Aug 2017 13:38:22 +0000 (16:38 +0300)]
net/mlx4_core: Fix misplaced brackets of sizeof
When changing the sizeof style usage in the patch cited below,
one brackets misplacement was introduced. Here we fix it.
Fixes: 31975e27a4b5 ("mlx4: sizeof style usage")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Mon, 28 Aug 2017 13:38:21 +0000 (16:38 +0300)]
net/mlx4_core: Make explicit conversion to 64bit value
The "lg" variable is declared as int so in all places where this variable
is used as a shift operand, the output will be int too.
This produces the following smatch warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1532 mlx4_map_cmd() warn:
should '1 << lg' be a 64 bit type?
Simple declaration of "1" to be "1ULL" will fix the issue.
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Mon, 28 Aug 2017 13:38:20 +0000 (16:38 +0300)]
net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap
In order to avoid temporary large structs on the stack,
allocate them dynamically.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Tue, 29 Aug 2017 20:16:57 +0000 (13:16 -0700)]
bridge: fdb add and delete tracepoints
A few useful tracepoints to trace bridge forwarding
database updates.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 21:42:17 +0000 (14:42 -0700)]
Merge branch 'systemport-sf2-mdio-endian'
Florian Fainelli says:
====================
Endian fixes for SYSTEMPORT/SF2/MDIO
While trying an ARM BE kernel for kinks, the 3 drivers below started not
working and the reasons why became pretty obvious because the register space
remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's
native endian (let's call that a feature).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 20:35:18 +0000 (13:35 -0700)]
net: phy: mdio-bcm-unimac: Use correct I/O accessors
The driver currently uses __raw_{read,write}l which works for all
platforms supported: Broadcom MIPS LE/BE (native endian), ARM LE (native
endian) but not ARM BE (registers are still LE). Switch to using the
proper accessors for all platforms and explain why Broadcom MIPS BE is
special here, in doing so, we introduce a couple of helper functions to
abstract these differences.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 20:35:17 +0000 (13:35 -0700)]
net: systemport: Set correct RSB endian bits based on host
RSB_SWAP0 needs to match the host CPU endian, and it needs to be set
for LE and clear for BE. RSB_SWAP1 must always be cleared for SYSTEMPORT
Lite.
With these settings, we have the Receive Status Block always match the
host endian and we do not need to perform any conversion. Since there is
not necessarily a CONFIG_CPU_LITTLE_ENDIAN option defined, we test for
!CONFIG_CPU_BIG_ENDIAN which is guaranteed to be set.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 20:35:16 +0000 (13:35 -0700)]
net: dsa: bcm_sf2: Use correct I/O accessors
The Starfigther 2 driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 20:35:15 +0000 (13:35 -0700)]
net: systemport: Use correct I/O accessors
The SYSTEMPORT driver currently uses __raw_{read,write}l which means
native I/O endian. This works correctly for an ARM LE kernel (default)
but fails miserably on an ARM BE (BE8) kernel where registers are kept
little endian, so replace uses with {read,write}l_relaxed here which is
what we want because this is all performance sensitive code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 18:04:43 +0000 (11:04 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2017-08-28' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.14
rsi driver is getting a lot of new features lately, but as usual
active development happening on iwlwifi as well as other drivers.
I pulled wireless-drivers to fix multiple conflicts in iwlwifi and to
make it easier further development.
Major changes:
ath10k
* initial UBS bus support (no full support yet)
* add tdls support for 10.4 firmware
ath9k
* add Dell Wireless 1802
wil6210
* support FW RSSI reporting
rsi
* support legacy power save, U-APSD, rf-kill and AP mode
* RTS threshold configuration
brcmfmac
* support CYW4373 SDIO/USB chipset
iwlwifi
* some more code moved to a new directory
* add new PCI ID for 7265D
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Mon, 28 Aug 2017 05:52:20 +0000 (11:22 +0530)]
net: stmmac: constify clk_div_table
clk_div_table are not supposed to change at runtime.
meson8b_dwmac structure is working with const clk_div_table.
So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 17:51:29 +0000 (10:51 -0700)]
Merge branch 'XDP-redirect-tracepoints'
Jesper Dangaard Brouer says:
====================
XDP redirect tracepoints
I feel this is as far as I can take the tracepoint infrastructure to
assist XDP monitoring.
Tracepoints comes with a base overhead of 25 nanosec for an attached
bpf_prog, and 48 nanosec for using a full perf record. This is
problematic for the XDP use-case, but it is very convenient to use the
existing perf infrastructure.
From a performance perspective, the real solution would be to attach
another bpf_prog (that understand xdp_buff), but I'm not sure we want
to introduce yet another bpf attach API for this.
One thing left is to standardize the possible err return codes, to a
limited set, to allow easier (and faster) mapping into a bpf map.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:11 +0000 (16:38 +0200)]
samples/bpf: xdp_monitor tool based on tracepoints
This tool xdp_monitor demonstrate how to use the different xdp_redirect
tracepoints xdp_redirect{,_map}{,_err} from a BPF program.
The default mode is to only monitor the error counters, to avoid
affecting the per packet performance. Tracepoints comes with a base
overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
using a full perf record (with non-matching filter). Thus, default
loading the --stats mode could affect the maximum performance.
This version of the tool is very simple and count all types of errors
as one. It will be natural to extend this later with the different
types of errors that can occur, which should help users quickly
identify common mistakes.
Because the TP_STRUCT was kept in sync all the tracepoints loads the
same BPF code. It would also be natural to extend the map version to
demonstrate how the map information could be used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:06 +0000 (16:38 +0200)]
samples/bpf: xdp_redirect load XDP dummy prog on TX device
For supporting XDP_REDIRECT, a device driver must (obviously)
implement the "TX" function ndo_xdp_xmit(). An additional requirement
is you cannot TX out a device, unless it also have a xdp bpf program
attached. This dependency is caused by the driver code need to setup
XDP resources before it can ndo_xdp_xmit.
Update bpf samples xdp_redirect and xdp_redirect_map to automatically
attach a dummy XDP program to the configured ifindex_out device. Use
the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
overriding an existing XDP prog on the device.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:38:01 +0000 (16:38 +0200)]
xdp: separate xdp_redirect tracepoint in map case
Creating as specific xdp_redirect_map variant of the xdp tracepoints
allow users to write simpler/faster BPF progs that get attached to
these tracepoints.
Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map
similar enough, that a tool can read the top part of the TP_STRUCT and
produce similar monitor statistics.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:56 +0000 (16:37 +0200)]
xdp: separate xdp_redirect tracepoint in error case
There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.
Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact. Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)
The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/
13782002-1/
10267937)*10^9 = -24.83 ns).
Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/
13782002-1/
8312497)*10^9 = -47.74))
Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:51 +0000 (16:37 +0200)]
xdp: make xdp tracepoints report bpf prog id instead of prog_tag
Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:45 +0000 (16:37 +0200)]
xdp: tracepoint xdp_redirect also need a map argument
To make sense of the map index, the tracepoint user also need to know
that map we are talking about. Supply the map pointer but only expose
the map->id.
The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.
This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Tue, 29 Aug 2017 14:37:40 +0000 (16:37 +0200)]
xdp: remove redundant argument to trace_xdp_redirect
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.
Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog
processing events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 16:42:48 +0000 (09:42 -0700)]
Merge tag 'rxrpc-next-
20170829' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Miscellany
Here are a number of patches that make some changes/fixes and add a couple
of extensions to AF_RXRPC for kernel services to use. The changes and
fixes are:
(1) Use time64_t rather than u32 outside of protocol or
UAPI-representative structures.
(2) Use the correct time stamp when loading a key from an XDR-encoded
Kerberos 5 key.
(3) Fix IPv6 support.
(4) Fix some places where the error code is being incorrectly made
positive before returning.
(5) Remove some white space.
And the extensions:
(6) Add an end-of-Tx phase notification, thereby allowing kAFS to
transition the state on its own call record at the correct point,
rather than having to do it in advance and risk non-completion of the
call in the wrong state.
(7) Allow a kernel client call to be retried if it fails on a network
error, thereby making it possible for kAFS to iterate over a number of
IP addresses without having to reload the Tx queue and re-encrypt data
each time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Aug 2017 16:41:56 +0000 (09:41 -0700)]
Merge branch 'addrlabel-no-rtnl-locking'
Florian Westphal says:
====================
addrlabel: don't use rtnl locking
addrlabel doesn't appear to require rtnl lock as the addrlabel
table uses a spinlock to serialize add/delete operations.
Also, entries are reference counted so it should be safe
to call the rtnl ops without the rtnl mutex.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 29 Aug 2017 11:29:42 +0000 (13:29 +0200)]
addrlabel: add/delete/get can run without rtnl
There appears to be no need to use rtnl, addrlabel entries are refcounted
and add/delete is serialized by the addrlabel table spinlock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 29 Aug 2017 11:29:41 +0000 (13:29 +0200)]
selftests: add addrlabel add/delete to rtnetlink.sh
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Tue, 29 Aug 2017 07:09:29 +0000 (09:09 +0200)]
staging: irda: update MAINTAINERS
Now that the IRDA code has moved under drivers/staging/irda/, update the
MAINTAINERS file with the new location.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 29 Aug 2017 06:15:03 +0000 (11:45 +0530)]
bnxt_en: add a dummy definition for bnxt_vf_rep_get_fid()
When bnxt VF-reps are not compiled in (CONFIG_BNXT_SRIOV is off)
bnxt_tc.c needs a dummy definition of the routine bnxt_vf_rep_get_fid().
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Tue, 29 Aug 2017 09:19:01 +0000 (10:19 +0100)]
rxrpc: Allow failed client calls to be retried
Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.
Two new functions are provided:
(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.
(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.
A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 29 Aug 2017 09:18:56 +0000 (10:18 +0100)]
rxrpc: Add notification of end-of-Tx phase
Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.
This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.
Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 29 Aug 2017 09:18:50 +0000 (10:18 +0100)]
rxrpc: Remove some excess whitespace
Remove indentation from some blank lines.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 29 Aug 2017 09:18:43 +0000 (10:18 +0100)]
rxrpc: Don't negate call->error before returning it
call->error is stored as 0 or a negative error code. Don't negate this
value (ie. make it positive) before returning it from a kernel function
(though it should still be negated before passing to userspace through a
control message).
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 29 Aug 2017 09:18:37 +0000 (10:18 +0100)]
rxrpc: Fix IPv6 support
Fix IPv6 support in AF_RXRPC in the following ways:
(1) When extracting the address from a received IPv4 packet, if the local
transport socket is open for IPv6 then fill out the sockaddr_rxrpc
struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
of an AF_INET one.
(2) When sending CHALLENGE or RESPONSE packets, the transport length needs
to be set from the sockaddr_rxrpc::transport_len field rather than
sizeof() on the IPv4 transport address.
(3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
the address correctly before searching for the affected peer.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 29 Aug 2017 09:15:40 +0000 (10:15 +0100)]
rxrpc: Use correct timestamp from Kerberos 5 ticket
When an XDR-encoded Kerberos 5 ticket is added as an rxrpc-type key, the
expiry time should be drawn from the k5 part of the token union (which was
what was filled in), rather than the kad part of the union.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Baolin Wang [Tue, 29 Aug 2017 09:15:40 +0000 (10:15 +0100)]
net: rxrpc: Replace time_t type with time64_t type
Since the 'expiry' variable of 'struct key_preparsed_payload' has been
changed to 'time64_t' type, which is year 2038 safe on 32bits system.
In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type
when copying ticket expires time to 'prep->expiry', then this patch
introduces two helper functions to help convert 'u32' to 'time64_t'
type.
This patch also uses ktime_get_real_seconds() to get current time instead
of get_seconds() which is not year 2038 safe on 32bits system.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Vitaly Kuznetsov [Mon, 28 Aug 2017 13:16:05 +0000 (15:16 +0200)]
hinic: don't build the module by default
We probably don't want to enable code supporting particular hardware by
default e.g. when someone does 'make defconfig'. Other ethernet modules
don't do it.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Aug 2017 23:57:10 +0000 (16:57 -0700)]
Merge branch 'bnxt_en-next'
Michael Chan says:
====================
bnxt_en: Updates.
Various changes including updated firmware interface, improved TX ring
allocation scheme, improved out-of-memory logic in NAPI loop, reduced
default rings on multi-port devices, new PCI IDs. Of particular note,
CPU affinity hints from Vasundhara Volam.
TC Flower eswitch support from Sathya Perla.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 28 Aug 2017 17:40:35 +0000 (13:40 -0400)]
bnxt_en: add code to query TC flower offload stats
This patch adds code to implement TC_CLSFLOWER_STATS TC-cmd and the
required FW code to query the stats from the HW.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 28 Aug 2017 17:40:34 +0000 (13:40 -0400)]
bnxt_en: add TC flower offload flow_alloc/free FW cmds
This patch adds the hwrm_cfa_flow_alloc/free() routines
that are needed to issue the FW cmds needed for TC flower offload.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 28 Aug 2017 17:40:33 +0000 (13:40 -0400)]
bnxt_en: bnxt: add TC flower filter offload support
This patch adds support for offloading TC based flow
rules and actions for the 'flower' classifier in the bnxt_en driver.
It includes logic to parse flow rules and actions received from the
TC subsystem, store them and issue the corresponding
hwrm_cfa_flow_alloc/free FW cmds. L2/IPv4/IPv6 flows and drop,
redir, vlan push/pop actions are supported in this patch.
In this patch the hwrm_cfa_flow_xxx routines are just stubs.
The code for these routines is introduced in the next patch for easier
review. Also, the code to query the TC/flower action stats will
be introduced in a subsequent patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 28 Aug 2017 17:40:32 +0000 (13:40 -0400)]
bnxt_en: fix clearing devlink ptr from bnxt struct
The routine bnxt_link_bp_to_dl() is used to set the devlink ptr
in bnxt struct (bp) and also to set the bnxt back ptr in
the devlink struct. If devlink_register() fails, bp->dl must
be cleared which is not happening currently. This patch fixes
bnxt_link_bp_to_dl() to clear bp->dl by passing a NULL dl ptr.
Fixes: 4ab0c6a8ffd7 ("bnxt_en: add support to enable VF-representors")
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 28 Aug 2017 17:40:31 +0000 (13:40 -0400)]
bnxt_en: Reduce default rings on multi-port cards.
Reduce default rings from 8 to 4 on multi-port cards to reduce memory
usage.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 28 Aug 2017 17:40:30 +0000 (13:40 -0400)]
bnxt_en: Improve -ENOMEM logic in NAPI poll loop.
If we cannot allocate RX buffers in the NAPI poll loop when processing
an RX event, the current code does not count that event towards the NAPI
budget. This can cause us to potentially loop forever in NAPI if we
consistently cannot allocate new buffers. Improve it by counting
-ENOMEM event as 1 towards the NAPI budget.
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Branden [Mon, 28 Aug 2017 17:40:29 +0000 (13:40 -0400)]
bnxt: initialize board_info values with proper enums
initialize board_info values with proper enums for defensive programming
purposes. This will avoid any errors of the enums being declared not
lining up with the board_info array.
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ray Jui [Mon, 28 Aug 2017 17:40:28 +0000 (13:40 -0400)]
bnxt: Add PCIe device IDs for bcm58802/bcm58808
Add PCIe device ID for bcm58802 and bcm58808. Also add chip number
update to declare bcm588xx as chip class phase 4 and later
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>