David Ahern [Tue, 4 Jun 2019 03:19:51 +0000 (20:19 -0700)]
ipv4: Plumb support for nexthop object in a fib_info
Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
fib_info side of the nexthop <-> fib_info relationship.
Add fi_list list_head to 'struct nexthop' to track fib_info entries
using a nexthop instance. Add __remove_nexthop_fib and add it to
__remove_nexthop to walk the new list_head and mark those fib entries
as dead when the nexthop is deleted.
Add a few nexthop helpers for use when a nexthop is added to fib_info:
- nexthop_cmp to determine if 2 nexthops are the same
- nexthop_path_fib_result to select a path for a multipath
'struct nexthop'
- nexthop_fib_nhc to select a specific fib_nh_common within a
multipath 'struct nexthop'
Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
case.
Update the fib_info functions to check for fi->nh and take a different
path as needed:
- free_fib_info_rcu - put the nexthop object reference
- fib_release_info - remove the fib_info from the nexthop's fi_list
- nh_comp - use nexthop_cmp when either fib_info references a nexthop
object
- fib_info_hashfn - use the nexthop id for the hashing vs the oif of
each fib_nh in a fib_info
- fib_nlmsg_size - add space for the RTA_NH_ID attribute
- fib_create_info - verify nexthop reference can be taken, verify
nexthop spec is valid for fib entry, and add fib_info to fi_list for
a nexthop
- fib_select_multipath - use the new nexthop_path_fib_result to select a
path when nexthop objects are used
- fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
it the same as a fib entry using 'blackhole'
The bulk of the changes are in fib_semantics.c and most of that is
moving the existing change_nexthops into an else branch.
Update the nexthop code to walk fi_list on a nexthop deleted to remove
fib entries referencing it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 4 Jun 2019 03:19:50 +0000 (20:19 -0700)]
ipv4: Prepare for fib6_nh from a nexthop object
Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
to use a fib6_nh based nexthop. In the end, only code not using a
nexthop object in a fib_info should directly access fib_nh in a fib_info
without checking the famiy and going through fib_nh_common. Those
functions will be marked when it is not directly evident.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 4 Jun 2019 03:19:49 +0000 (20:19 -0700)]
ipv4: Use accessors for fib_info nexthop data
Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
fib_dev macro which is an alias for the first nexthop. Replacements:
fi->fib_dev --> fib_info_nh(fi, 0)->fib_nh_dev
fi->fib_nh --> fib_info_nh(fi, 0)
fi->fib_nh[i] --> fib_info_nh(fi, i)
fi->fib_nhs --> fib_info_num_path(fi)
where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
returns fi->fib_nhs.
Move the existing fib_info_nhc to nexthop.h and define the new ones
there. A later patch adds a check if a fib_info uses a nexthop object,
and defining the helpers in nexthop.h avoid circular header
dependencies.
After this all remaining open coded references to fi->fib_nhs and
fi->fib_nh are in:
- fib_create_info and helpers used to lookup an existing fib_info
entry, and
- the netdev event functions fib_sync_down_dev and fib_sync_up.
The latter two will not be reused for nexthops, and the fib_create_info
will be updated to handle a nexthop in a fib_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 4 Jun 2019 01:37:03 +0000 (18:37 -0700)]
ipv6: Always allocate pcpu memory in a fib6_nh
A recent commit had an unintended side effect with reject routes:
rt6i_pcpu is expected to always be initialized for all fib6_info except
the null entry. The commit mentioned below skips it for reject routes
and ends up leaking references to the loopback device. For example,
ip netns add foo
ip -netns foo li set lo up
ip -netns foo -6 ro add blackhole 2001:db8:1::1
ip netns exec foo ping6 2001:db8:1::1
ip netns del foo
ends up spewing:
unregister_netdevice: waiting for lo to become free. Usage count = 3
The fib_nh_common_init is not needed for reject routes (no ipv4 caching
or encaps), so move the alloc_percpu_gfp after it and adjust the goto label.
Fixes: f40b6ae2b612 ("ipv6: Move pcpu cached routes to fib6_nh")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xue Chaojing [Tue, 4 Jun 2019 01:16:08 +0000 (01:16 +0000)]
hinic: add LRO support
This patch adds LRO support for the HiNIC driver.
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Jun 2019 21:49:38 +0000 (14:49 -0700)]
Merge branch 'bond-mpls'
Ariel Levkovich says:
====================
Support MPLS features in bonding and vlan net devices
Netdevice HW MPLS features are not passed from device driver's netdevice to
upper netdevice, specifically VLAN and bonding netdevice which are created
by the kernel when needed.
This prevents enablement and usage of HW offloads, such as TSO and checksumming
for MPLS tagged traffic when running via VLAN or bonding interface.
The patches introduce changes to the initialization steps of the VLAN and bonding
netdevices to inherit the MPLS features from lower netdevices to allow the HW
offloads.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Levkovich [Mon, 3 Jun 2019 22:36:47 +0000 (22:36 +0000)]
net: vlan: Inherit MPLS features from parent device
During the creation of the VLAN interface net device,
the various device features and offloads are being set based
on the parent device's features.
The code initiates the basic, vlan and encapsulation features
but doesn't address the MPLS features set and they remain blank.
As a result, all device offloads that have significant performance
effect are disabled for MPLS traffic going via this VLAN device such
as checksumming and TSO.
This patch makes sure that MPLS features are also set for the
VLAN device based on the parent which will allow HW offloads of
checksumming and TSO to be performed on MPLS tagged packets.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Levkovich [Mon, 3 Jun 2019 22:36:46 +0000 (22:36 +0000)]
net: bonding: Inherit MPLS features from slave devices
When setting the bonding interface net device features,
the kernel code doesn't address the slaves' MPLS features
and doesn't inherit them.
Therefore, HW offloads that enhance performance such as
checksumming and TSO are disabled for MPLS tagged traffic
flowing via the bonding interface.
The patch add the inheritance of the MPLS features from the
slave devices with a similar logic to setting the bonding device's
VLAN and encapsulation features.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Jun 2019 21:33:50 +0000 (14:33 -0700)]
Merge branch 'net-tls-small-general-improvements'
Jakub Kicinski says:
====================
net/tls: small general improvements
This series cleans up and improves the tls code, mostly the offload
parts.
First a slight performance optimization - avoiding unnecessary re-
-encryption of records in patch 1. Next patch 2 makes the code
more resilient by checking for errors in skb_copy_bits(). Next
commit removes a warning which can be triggered in normal operation,
(especially for devices explicitly making use of the fallback path).
Next two paths change the condition checking around the call to
tls_device_decrypted() to make it easier to extend. Remaining
commits are centered around reorganizing struct tls_context for
better cache utilization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:05 +0000 (15:17 -0700)]
net/tls: don't pass version to tls_advance_record_sn()
All callers pass prot->version as the last parameter
of tls_advance_record_sn(), yet tls_advance_record_sn()
itself needs a pointer to prot. Pass prot from callers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:04 +0000 (15:17 -0700)]
net/tls: reorganize struct tls_context
struct tls_context is slightly badly laid out. If we reorder things
right we can save 16 bytes (320 -> 304) but also make all fast path
data fit into two cache lines (one read only and one read/write,
down from four cache lines).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:03 +0000 (15:17 -0700)]
net/tls: use version from prot
ctx->prot holds the same information as per-direction contexts.
Almost all code gets TLS version from this structure, convert
the last two stragglers, this way we can improve the cache
utilization by moving the per-direction data into cold cache lines.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:02 +0000 (15:17 -0700)]
net/tls: don't re-check msg decrypted status in tls_device_decrypted()
tls_device_decrypted() is only called from decrypt_skb_update(),
when ctx->decrypted == false, there is no need to re-check the bit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:01 +0000 (15:17 -0700)]
net/tls: don't look for decrypted frames on non-offloaded sockets
If the RX config of a TLS socket is SW, there is no point iterating
over the fragments and checking if frame is decrypted. It will
always be fully encrypted. Note that in fully encrypted case
the function doesn't actually touch any offload-related state,
so it's safe to call for TLS_SW, today. Soon we will introduce
code which can only be called for offloaded contexts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:17:00 +0000 (15:17 -0700)]
net/tls: remove false positive warning
It's possible that TCP stack will decide to retransmit a packet
right when that packet's data gets acked, especially in presence
of packet reordering. This means that packets may be in flight,
even though tls_device code has already freed their record state.
Make fill_sg_in() and in turn tls_sw_fallback() not generate a
warning in that case, and quietly proceed to drop such frames.
Make the exit path from tls_sw_fallback() drop monitor friendly,
for users to be able to troubleshoot dropped retransmissions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:16:59 +0000 (15:16 -0700)]
net/tls: check return values from skb_copy_bits() and skb_store_bits()
In light of recent bugs, we should make a better effort of
checking return values. In theory none of the functions should
fail today.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 3 Jun 2019 22:16:58 +0000 (15:16 -0700)]
net/tls: fully initialize the msg wrapper skb
If strparser gets cornered into starting a new message from
an sk_buff which already has frags, it will allocate a new
skb to become the "wrapper" around the fragments of the
message.
This new skb does not inherit any metadata fields. In case
of TLS offload this may lead to unnecessarily re-encrypting
the message, as skb->decrypted is not set for the wrapper skb.
Try to be conservative and copy all fields of old skb
strparser's user may reasonably need.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Chancellor [Mon, 3 Jun 2019 20:49:53 +0000 (13:49 -0700)]
net: mscc: ocelot: Fix some struct initializations
Clang warns:
drivers/net/ethernet/mscc/ocelot_ace.c:335:37: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
struct ocelot_vcap_u64 payload = { 0 };
^
{}
drivers/net/ethernet/mscc/ocelot_ace.c:336:28: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
struct vcap_data data = { 0 };
^
{}
drivers/net/ethernet/mscc/ocelot_ace.c:683:37: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
struct ocelot_ace_rule del_ace = { 0 };
^
{}
drivers/net/ethernet/mscc/ocelot_ace.c:743:28: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
struct vcap_data data = { 0 };
^
{}
4 warnings generated.
One way to fix these warnings is to add additional braces like Clang
suggests; however, there has been a bit of push back from some
maintainers[1][2], who just prefer memset as it is unambiguous, doesn't
depend on a particular compiler version[3], and properly initializes all
subobjects. Do that here so there are no more warnings.
[1]: https://lore.kernel.org/lkml/
022e41c0-8465-dc7a-a45c-
64187ecd9684@amd.com/
[2]: https://lore.kernel.org/lkml/
20181128.215241.
702406654469517539.davem@davemloft.net/
[3]: https://lore.kernel.org/lkml/
20181116150432.
2408a075@redhat.com/
Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam")
Link: https://github.com/ClangBuiltLinux/linux/issues/505
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 3 Jun 2019 20:41:44 +0000 (22:41 +0200)]
net: ipv4: fix rcu lockdep splat due to wrong annotation
syzbot triggered following splat when strict netlink
validation is enabled:
net/ipv4/devinet.c:1766 suspicious rcu_dereference_check() usage!
This occurs because we hold RTNL mutex, but no rcu read lock.
The second call site holds both, so just switch to the _rtnl variant.
Reported-by: syzbot+bad6e32808a3a97b1515@syzkaller.appspotmail.com
Fixes: 2638eb8b50cf ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Jun 2019 21:21:41 +0000 (14:21 -0700)]
Merge branch 'net-expose-flash-update-status-to-user'
Jiri Pirko says:
====================
expose flash update status to user
When user is flashing device using devlink, he currenly does not see any
information about what is going on, percentages, etc.
Drivers, for example mlxsw and mlx5, have notion about the progress
and what is happening. This patchset exposes this progress
information to userspace.
Example output for existing flash command:
$ devlink dev flash pci/0000:01:00.0 file firmware.bin
Preparing to flash
Flashing 100%
Flashing done
See this console recording which shows flashing FW on a Mellanox
Spectrum device:
https://asciinema.org/a/247926
Please see individual patches for changelog.
v2->v3 only adds tags and the last selftest patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:44 +0000 (15:40 +0200)]
selftests: add basic netdevsim devlink flash testing
Utilizes the devlink flash code.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:43 +0000 (15:40 +0200)]
netdevsim: implement fake flash updating with notifications
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:42 +0000 (15:40 +0200)]
mlxsw: Implement flash update status notifications
Implement mlxfw status_notify op by passing notification down to
devlink. Also notify about flash update begin and end.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:41 +0000 (15:40 +0200)]
mlxfw: Introduce status_notify op and call it to notify about the status
Add new op status_notify which is called to update the user about
flashing status.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:40 +0000 (15:40 +0200)]
devlink: allow driver to update progress of flash update
Introduce a function to be called from drivers during flash. It sends
notification to userspace about flash update progress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:39 +0000 (15:40 +0200)]
mlxfw: Propagate error messages through extack
Currently the error messages are printed to dmesg. Propagate them also
to directly to user doing the flashing through extack.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:38 +0000 (15:40 +0200)]
mlx5: Move firmware flash implementation to devlink
Benefit from the devlink flash update implementation and ethtool
fallback to it and move firmware flash implementation there.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 4 Jun 2019 13:40:37 +0000 (15:40 +0200)]
mlxsw: Move firmware flash implementation to devlink
Benefit from the devlink flash update implementation and ethtool
fallback to it and move firmware flash implementation there.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dinh Nguyen [Mon, 3 Jun 2019 14:44:18 +0000 (09:44 -0500)]
net: stmmac: socfpga: add RMII phy mode
Add option for enabling RMII phy mode.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Wei Liang Lim <wei.liang.lim@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 4 Jun 2019 18:49:20 +0000 (11:49 -0700)]
Merge branch 'FDB-updates-for-SJA1105-DSA-driver'
Vladimir Oltean says:
====================
FDB updates for SJA1105 DSA driver
This patch series adds:
- FDB switchdev support for the second generation of switches (P/Q/R/S).
I could test/code these now that I got a board with a SJA1105Q.
- Management route support for SJA1105 P/Q/R/S. This is needed to send
PTP/STP/management frames over the CPU port.
- Logic to hide private DSA VLANs from the 'bridge fdb' commands.
The new FDB code was also tested and still works on SJA1105T.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:16:01 +0000 (00:16 +0300)]
net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command
TX VLANs and RX VLANs are an internal implementation detail of DSA for
frame tagging. They work by installing special VLANs on switch ports in
the operating modes where no behavior change w.r.t. VLANs can be
observed by the user.
Therefore it makes sense to hide these VLANs in the 'bridge fdb'
command, as well as translate the pvid into the RX VID and TX VID on
'bridge fdb add' and 'bridge fdb del' commands.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:15:54 +0000 (00:15 +0300)]
net: dsa: sja1105: Unset port from forwarding mask unconditionally on fdb_del
This is a cosmetic patch that simplifies the code by removing a
redundant check. A logical AND-with-zero performed on a zero is still
zero.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:15:45 +0000 (00:15 +0300)]
net: dsa: sja1105: Add FDB operations for P/Q/R/S series
This adds support for manipulating the L2 forwarding database (dump,
add, delete) for the second generation of NXP SJA1105 switches.
At the moment only FDB entries installed statically through 'bridge fdb'
are visible in the dump callback - the dynamically learned ones are
still under investigation.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:15:33 +0000 (00:15 +0300)]
net: dsa: sja1105: Add P/Q/R/S management route support via dynamic interface
Management routes are one-shot FDB rules installed on the CPU port for
sending link-local traffic. They are a prerequisite for STP, PTP etc to
work.
Also make a note that removing a management route was not supported on
the previous generation of switches.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:59 +0000 (00:11 +0300)]
net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not found
Conceptually, if an entry is not found in the requested hardware table,
it is not an invalid request - so change the error returned
appropriately.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:58 +0000 (00:11 +0300)]
net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup operations
These are needed in order to implement the switchdev FDB callbacks.
Compared to the E/T generation, not only the ABI (bit offsets) is
different, but also the introduction of the HOSTCMD field which permits
O(1) TCAM search for an FDB entry. Make use of the newly introduce
OP_SEARCH to permit that. It will be used while adding and deleting an
FDB entry (to see whether it exists or not).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:57 +0000 (00:11 +0300)]
net: dsa: sja1105: Make room for P/Q/R/S FDB operations
The DSA callbacks were written with the E/T (first generation) in mind,
which is quite different.
For P/Q/R/S completely new implementations need to be provided, which
are held as function pointers in the priv->info structure. We are
taking a slightly roundabout way for this (a function from
sja1105_main.c reads a structure defined in sja1105_spi.c that
points to a function defined in sja1105_main.c), but it is what it is.
The FDB dump callback works for both families, hence no function pointer
for that.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:56 +0000 (00:11 +0300)]
net: dsa: sja1105: Plug in support for TCAM searches via the dynamic interface
Only a single dynamic configuration table of the SJA1105 P/Q/R/S
supports this operation: the FDB.
To keep the existing structure in place (sja1105_dynamic_config_read and
sja1105_dynamic_config_write) and not introduce any new function, a
convention is made for sja1105_dynamic_config_read that a negative index
argument denotes a search for the entry provided as argument.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:55 +0000 (00:11 +0300)]
net: dsa: sja1105: Add missing L2 Forwarding Table definitions for P/Q/R/S
This appends to the L2 Forwarding and L2 Forwarding Parameters tables
(originally added for first-generation switches) the bits that are new
in the second generation.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:54 +0000 (00:11 +0300)]
net: dsa: sja1105: Fix bit offsets of index field from L2 lookup entries
This was inadvertently copied from the SJA1105 E/T structure and not
tested. Cross-checking with the P/Q/R/S documentation (UM11040) makes
it immediately obvious what the correct bit offsets for this field are.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 2 Jun 2019 21:11:53 +0000 (00:11 +0300)]
net: dsa: sja1105: Shim declaration of struct sja1105_dyn_cmd
This structure is merely an implementation detail and should be hidden
from the sja1105_dynamic_config.h header, which provides to the rest of
the driver an abstract access to the dynamic configuration interface of
the switch.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 22:38:43 +0000 (15:38 -0700)]
Merge branch 'r8169-make-firmware-handling-code-ready-to-be-factored-out'
Heiner Kallweit says:
====================
r8169: make firmware handling code ready to be factored out
This series contains the final steps to make firmware handling code
ready to be factored out into a separate source code file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 3 Jun 2019 19:26:31 +0000 (21:26 +0200)]
r8169: add rtl_fw_request_firmware and rtl_fw_release_firmware
Add rtl_fw_request_firmware and rtl_fw_release_firmware which will be
part of the API when factoring out the firmware handling code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 3 Jun 2019 19:25:43 +0000 (21:25 +0200)]
r8169: make rtl_fw_format_ok and rtl_fw_data_ok more independent
In preparation of factoring out the firmware handling code avoid any
usage of struct rtl8169_private internals. As part of it we can inline
rtl_check_firmware.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 3 Jun 2019 19:24:38 +0000 (21:24 +0200)]
r8169: simplify rtl_fw_write_firmware
Similar to rtl_fw_data_ok() we can simplify the code by moving
incrementing the index to the for loop initialization.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 3 Jun 2019 19:23:43 +0000 (21:23 +0200)]
r8169: add enum rtl_fw_opcode
Replace the firmware opcode defines with a proper enum. The BUG()
in rtl_fw_write_firmware() can be removed because the call to
rtl_fw_data_ok() ensures all opcodes are valid.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 22:32:50 +0000 (15:32 -0700)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
code optimizations & bugfixes for HNS3 driver
This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.
[patch 1/10] removes the redundant core reset type
[patch 2/10 - 3/10] fixes two VLAN related issues
[patch 4/10] fixes a TM issue
[patch 5/10 - 10/10] includes some patches related to RAS & MSI-X error
Change log:
V1->V2: removes two patches which needs to change HNS's infiniband
driver as well, they will be upstreamed later with the
infiniband's one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:22 +0000 (10:09 +0800)]
net: hns3: delay and separate enabling of NIC and ROCE HW errors
All RAS and MSI-X should be enabled just in the final stage of HNS3
initialization. It means that they should be enabled in
hclge_init_xxx_client_instance instead of hclge_ae_dev(). Especially
MSI-X, if it is enabled before opening vector0 IRQ, there are some
chances that a MSI-X error will cause failure on initialization of
NIC client instane. So this patch delays enabling of HW errors.
Otherwise, we also separate enabling of ROCE RAS from NIC, because
it's not reasonable to enable ROCE RAS if we even don't have a ROCE
driver.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:21 +0000 (10:09 +0800)]
net: hns3: add opcode about query and clear RAS & MSI-X to special opcode
There are four commands being used to query and clear RAS and MSI-X
interrupts status. They should be contained in array of special opcodes
because these commands have several descriptors, and we need to judge
return value in the first descriptor rather than the last one as other
opcodes. In addition, we shouldn't set the NEXT_FLAG of first descriptor.
This patch fixes above issues.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:20 +0000 (10:09 +0800)]
net: hns3: remove setting bit of reset_requests when handling mac tunnel interrupts
We shouldn't set HNAE3_NONE_RESET bit of the variable that represents a
reset request during handling of MSI-X errors, or may cause issue when
trigger reset.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:19 +0000 (10:09 +0800)]
net: hns3: add handling of two bits in MAC tunnel interrupts
LINK_UP and LINK_DOWN are two bits of MAC tunnel interrupts, but previous
HNS3 driver didn't handle them. If they were enabled, value of these two
bits will change during link down and link up, which will cause HNS3
driver keep receiving IRQ but can't handle them.
This patch adds handling of these two bits of interrupts, we will record
and clear them as what we do to other MAC tunnel interrupts.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:18 +0000 (10:09 +0800)]
net: hns3: set ops to null when unregister ad_dev
The hclge/hclgevf and hns3 module can be unloaded independently,
when hclge/hclgevf unloaded firstly, the ops of ae_dev should
be set to NULL, otherwise it will cause an use-after-free problem.
Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Mon, 3 Jun 2019 02:09:17 +0000 (10:09 +0800)]
net: hns3: add a check to pointer in error_detected and slot_reset
If we add a VF without loading hclgevf.ko and then there is a RAS error
occurs, PCIe AER will call error_detected and slot_reset of all functions,
and will get a NULL pointer when we check ad_dev->ops->handle_hw_ras_error.
This will cause a call trace and failures on handling of follow-up RAS
errors.
This patch check ae_dev and ad_dev->ops at first to solve above issues.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Mon, 3 Jun 2019 02:09:16 +0000 (10:09 +0800)]
net: hns3: set the port shaper according to MAC speed
This patch sets the port shaper according to the MAC speed as
suggested by hardware user manual.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Mon, 3 Jun 2019 02:09:15 +0000 (10:09 +0800)]
net: hns3: fix VLAN filter restore issue after reset
In orginal codes, the driver only restore VLAN filter entries
for PF after reset, the VLAN entries of VF will lose in this
case.
This patch fixes it by recording VLAN IDs for each function
when add VLAN, and restore the VLAN IDs after reset.
Fixes: 681ec3999b3d ("net: hns3: fix for vlan table lost problem when resetting")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Mon, 3 Jun 2019 02:09:14 +0000 (10:09 +0800)]
net: hns3: don't configure new VLAN ID into VF VLAN table when it's full
VF VLAN table can only support no more than 256 VLANs. When user
adds too many VLANs, the VF VLAN table will be full, and firmware
will close the VF VLAN table for the function. When VF VLAN table
is full, and user keeps adding new VLANs, it's unnecessary to
configure the VF VLAN table, because it will always fail, and print
warning message. The worst case is adding 4K VLANs, and doing reset,
it will take much time to restore these VLANs, which may cause VF
reset fail by timeout.
Fixes: 6c251711b37f ("net: hns3: Disable vf vlan filter when vf vlan table is full")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Mon, 3 Jun 2019 02:09:13 +0000 (10:09 +0800)]
net: hns3: remove redundant core reset
Since core reset is similar to the global reset, so this
patch removes it and uses global reset to replace it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 2 Jun 2019 18:24:18 +0000 (11:24 -0700)]
net: fix use-after-free in kfree_skb_list
syzbot reported nasty use-after-free [1]
Lets remove frag_list field from structs ip_fraglist_iter
and ip6_fraglist_iter. This seens not needed anyway.
[1] :
BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
Read of size 8 at addr
ffff888085a3cbc0 by task syz-executor303/8947
CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882
__ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
dst_output include/net/dst.h:433 [inline]
ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:671
___sys_sendmsg+0x803/0x920 net/socket.c:2292
__sys_sendmsg+0x105/0x1d0 net/socket.c:2330
__do_sys_sendmsg net/socket.c:2339 [inline]
__se_sys_sendmsg net/socket.c:2337 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44add9
Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f826f33bce8 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
00000000006e7a18 RCX:
000000000044add9
RDX:
0000000000000000 RSI:
0000000020000240 RDI:
0000000000000005
RBP:
00000000006e7a10 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000006e7a1c
R13:
00007ffcec4f7ebf R14:
00007f826f33c9c0 R15:
20c49ba5e353f7cf
Allocated by task 8947:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc_node mm/slab.c:3269 [inline]
kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579
__alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199
alloc_skb include/linux/skbuff.h:1058 [inline]
__ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519
ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688
rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940
inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:671
___sys_sendmsg+0x803/0x920 net/socket.c:2292
__sys_sendmsg+0x105/0x1d0 net/socket.c:2330
__do_sys_sendmsg net/socket.c:2339 [inline]
__se_sys_sendmsg net/socket.c:2337 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 8947:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
kfree_skbmem net/core/skbuff.c:625 [inline]
kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619
__kfree_skb net/core/skbuff.c:682 [inline]
kfree_skb net/core/skbuff.c:699 [inline]
kfree_skb+0xf0/0x390 net/core/skbuff.c:693
kfree_skb_list+0x44/0x60 net/core/skbuff.c:708
__dev_xmit_skb net/core/dev.c:3551 [inline]
__dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850
dev_queue_xmit+0x18/0x20 net/core/dev.c:3914
neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120
ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863
__ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
dst_output include/net/dst.h:433 [inline]
ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:671
___sys_sendmsg+0x803/0x920 net/socket.c:2292
__sys_sendmsg+0x105/0x1d0 net/socket.c:2330
__do_sys_sendmsg net/socket.c:2339 [inline]
__se_sys_sendmsg net/socket.c:2337 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at
ffff888085a3cbc0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 0 bytes inside of
224-byte region [
ffff888085a3cbc0,
ffff888085a3cca0)
The buggy address belongs to the page:
page:
ffffea0002168f00 refcount:1 mapcount:0 mapping:
ffff88821b6f63c0 index:0x0
flags: 0x1fffc0000000200(slab)
raw:
01fffc0000000200 ffffea00027bbf88 ffffea0002105b88 ffff88821b6f63c0
raw:
0000000000000000 ffff888085a3c080 000000010000000c 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888085a3ca80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888085a3cb00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
>
ffff888085a3cb80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff888085a3cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888085a3cc80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 0feca6190f88 ("net: ipv6: add skbuff fraglist splitter")
Fixes: c8b17be0b7a4 ("net: ipv4: add skbuff fraglist splitter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 2 Jun 2019 08:53:49 +0000 (10:53 +0200)]
r8169: use paged versions of phylib MDIO access functions
Use paged versions of phylib MDIO access functions to simplify
the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sat, 1 Jun 2019 08:06:05 +0000 (16:06 +0800)]
qed: Fix build error without CONFIG_DEVLINK
Fix gcc build error while CONFIG_DEVLINK is not set
drivers/net/ethernet/qlogic/qed/qed_main.o: In function `qed_remove':
qed_main.c:(.text+0x1eb4): undefined reference to `devlink_unregister'
Select DEVLINK to fix this.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 24e04879abdd ("qed: Add qed devlink parameters table")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 1 Jun 2019 02:17:33 +0000 (19:17 -0700)]
tcp: use this_cpu_read(*X) instead of *this_cpu_ptr(X)
this_cpu_read(*X) is slightly faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 1 Jun 2019 02:09:02 +0000 (19:09 -0700)]
ipv4: icmp: use this_cpu_read() in icmp_sk()
this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 1 Jun 2019 01:11:25 +0000 (18:11 -0700)]
ipv6: use this_cpu_read() in rt6_get_pcpu_route()
this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 22:00:00 +0000 (15:00 -0700)]
Merge branch 'Add-MT7629-ethernet-support'
Sean Wang says:
====================
Add MT7629 ethernet support
MT7629 inlcudes two sets of SGMIIs used for external switch or PHY, and embedded
switch (ESW) via GDM1, GePHY via GMAC2, so add several patches in the series to
make the code base common with the old SoCs.
The patch 1, 3 and 6, adds extension for SGMII to have the hardware configured
for 1G, 2.5G and AN to fit the capability of the target PHY. In patch 6 could be
an example showing how to use these configurations for underlying PHY speed to
match up the link speed of the target PHY.
The patch 4 is used for automatically configured the hardware path from GMACx to
the target PHY by the description in deviceetree topology to determine the
proper value for the corresponding MUX.
The patch 2 and 5 is for the update for MT7629 including dt-binding document and
its driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:15 +0000 (08:03 +0800)]
arm64: dts: mt7622: Enlarge the SGMII register range
Enlarge the SGMII register range and using 2.5G force mode on default.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:14 +0000 (08:03 +0800)]
net: ethernet: mediatek: Add MT7629 ethernet support
Add ethernet support to MT7629 SoC
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:13 +0000 (08:03 +0800)]
net: ethernet: mediatek: Integrate hardware path from GMAC to PHY variants
All path route on various SoCs all would be managed in common function
mtk_setup_hw_path that is determined by the both applied devicetree
regarding the path between GMAC and the target PHY or switch by the
capability of target SoC in the runtime.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:12 +0000 (08:03 +0800)]
net: ethernet: mediatek: Extend SGMII related functions
Add SGMII related logic into a separate file, and also provides options for
forcing 1G, 2.5, AN mode for the target PHY, that can be determined from
SGMII node in DTS.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:11 +0000 (08:03 +0800)]
dt-bindings: net: mediatek: Add support for MediaTek MT7629 SoC
Add binding document for the ethernet on MT7629 SoC.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Sat, 1 Jun 2019 00:03:10 +0000 (08:03 +0800)]
dt-bindings: clock: mediatek: Add an extra required property to sgmiisys
add an extra required property "mediatek,physpeed" to sgmiisys to determine
link speed to match up the capability of the target PHY.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 31 May 2019 22:27:00 +0000 (15:27 -0700)]
ipv6: icmp: use this_cpu_read() in icmpv6_sk()
In general, this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Also remove the inline attibute, totally useless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 31 May 2019 21:47:21 +0000 (22:47 +0100)]
flow_offload: include linux/kernel.h from flow_offload.h
flow_stats_update() uses max_t, so ensure we have that defined.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanislav Fomichev [Fri, 31 May 2019 21:05:06 +0000 (14:05 -0700)]
flow_dissector: remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
This flag is not used by any caller, remove it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 20:42:56 +0000 (13:42 -0700)]
Merge tag 'mlx5-updates-2019-05-31' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-05-31
This series provides some updates to mlx5 core and netdevice driver.
1) use __netdev_tx_sent_queue() to improve performance under GSO workload
2) Allow matching only enc_key_id/enc_dst_port for decapsulation action
3) Geneve support:
This patchset adds support for GENEVE tunnel encap/decap flows offload:
encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams.
The driver supports 6081 destination UDP port number, which is the
default IANA-assigned port.
Encap:
ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is
provided by the mlx5 driver to the outgoing packet.
Decap:
Geneve header is matched and the packet is decapsulated.
Notes about decap flows with Geneve TLV Options:
- Support offloading of 32-bit options data only
- At any given time, only one combination of class/type parameters
can be offloaded, but the same class/type combination can have
many different flows offloaded with different 32-bit option data
- Options with value of 0 can't be offloaded
Managing Geneve TLV options:
Matching (on receive) is done by ConnectX-5 flex parser.
Geneve TLV options are managed using General Object of type
“Geneve TLV Options”.
When the first flow with a certain class/type values is requested
to be offloaded, the driver creates a FW object with FW command
(Geneve TLV Options general object) and starts counting the number
of flows using this object.
During this time, any request with a different class/type values
will fail to be offloaded.
Once the refcount reaches 0, the driver destroys the TLV options
general object, and can now offload a flow with any class/type parameters.
Geneve TLV Options object is added to core device.
It is currently used to manage Geneve TLV options general
object allocation in FW and its reference counting only.
In the future it will also be used for managing geneve ports
by registering callbacks for ndo_udp_tunnel_add/del.
TC tunnel code refactoring:
As a preparation for Geneve code, the TC tunnel code in mlx5
was rearranged in a modular way, so that it would be easier
to add future tunnels:
- Defined tc tunnel object with the fields and callbacks that
any tunnel must implement.
- Define tc UDP tunnel object for UDP tunnels, such as VXLAN
- Move each tunnel code (GRE, VXLAN) to its own separate file
- Rewrite tc tunnel implementation in a general way – using
only the objects and their callbacks.
4) Termination tables:
Actions in tables set with the termination flag are guaranteed to terminate
the action list. Thus, potential looping functionality (e.g. haripin) can safely be
executed without potential loops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 20:30:38 +0000 (13:30 -0700)]
Merge branch 'ena-next'
Sameeh Jubran says:
====================
Extending the ena driver to support new features and enhance performance
This patchset introduces the following:
* add support for changing the inline header size (max_header_size) for applications
with overlay and nested headers
* enable automatic fallback to polling mode for admin queue when interrupt is not
available or missed
* add good checksum counter for Rx ethtool statistics
* update ena.txt
* some minor code clean-up
* some performance enhancements with doorbell calculations
Differences from V1:
* net: ena: add handling of llq max tx burst size (1/11):
* fixed christmas tree issue
* net: ena: ethtool: add extra properties retrieval via get_priv_flags (2/11):
* replaced snprintf with strlcpy
* dropped confusing error message
* added more details to the commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:29 +0000 (17:43 +0300)]
net: ena: use dev_info_once instead of static variable
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:28 +0000 (17:43 +0300)]
net: ena: add good checksum counter
Add a new statistics to ETHTOOL to specify if the device calculated
and validated the Rx csum.
Signed-off-by: Evgeny Shmeilin <evgeny@annapurnaLabs.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:27 +0000 (17:43 +0300)]
net: ena: optimise calculations for CQ doorbell
This patch initially checks if CQ doorbell
is needed before proceeding with the calculations.
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:26 +0000 (17:43 +0300)]
net: ena: add support for changing max_header_size in LLQ mode
Up until now the driver always used a single setting for the sizes
of the different parts of the llq entry - 128 for entry size, 2 for
descriptors before header and 96 for maximum header size.
The current code makes sure that the parts of the llq entry are
compatible with each other and with the initial llq entry size given
by the device.
This commit changes this code to support any llq entry size
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:25 +0000 (17:43 +0300)]
net: ena: allow automatic fallback to polling mode
Enable fallback to polling mode for Admin queue
when identified a command response arrival
without an accompanying MSI-X interrupt
Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:24 +0000 (17:43 +0300)]
net: ena: documentation: update ena.txt
Small cosmetic changes to ena.txt
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:23 +0000 (17:43 +0300)]
net: ena: add newline at the end of pr_err prints
Some pr_err prints lacked '\n' in the end. Added where missing.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:22 +0000 (17:43 +0300)]
net: ena: arrange ena_probe() function variables in reverse christmas tree
Reverse christmas tree arrangement is when strings are written from longer
to shorter with each line. Most of our functions are abiding this
arrangement but this function does not.
In this commit we arrange the variables of ena_probe() in reverse christmas
tree.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:21 +0000 (17:43 +0300)]
net: ena: replace free_tx/rx_ids union with single free_ids field in ena_ring
struct ena_ring holds a union of free_rx_ids and free_tx_ids.
Both of the above fields mean the exact same thing and are used
exactly the same way.
Furthermore, these fields are always used with a prefix of the
type of ring. So for tx it will be tx_ring->free_tx_ids, and for
rx it will be rx_ring->free_rx_ids, which shows how redundant the
"_tx" and "_rx" parts are.
Furthermore still, this may lead to confusing code like where
tx_ring->free_rx_ids which works correctly but looks like a mess.
This commit removes the aforementioned redundancy by replacing the
free_rx/tx_ids union with a single free_ids field.
It also changes a single goto label name from err_free_tx_ids: to
err_tx_free_ids: for consistency with the above new notation.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Mon, 3 Jun 2019 14:43:20 +0000 (17:43 +0300)]
net: ena: ethtool: add extra properties retrieval via get_priv_flags
This commit adds a mechanism for exposing different device
properties via ethtool's priv_flags. The strings are provided
by the device and copied to user space through the driver.
In this commit we:
Add commands, structs and defines necessary for handling
extra properties
Add functions for:
Allocation/destruction of a buffer for extra properties strings.
Retreival of extra properties strings and flags from the network device.
Handle the allocation of a buffer for extra properties strings.
* Initialize buffer with extra properties strings from the
network device at driver startup.
Use ethtool's get_priv_flags to expose extra properties of
the ENA device
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Mon, 3 Jun 2019 14:43:19 +0000 (17:43 +0300)]
net: ena: add handling of llq max tx burst size
There is a maximum TX burst size that the ENA device can handle.
It is exposed by the device to the driver and the driver
needs to comply with it to avoid bugs.
In this commit we:
1. Add ena_com_is_doorbell_needed(), which calculates the number of
llq entries that will be used to hold a packet, and will return
true if they exceed the number of allowed entries in a burst.
If the function returns true, a doorbell needs to be invoked
to send this packet in the next burst.
2. Follow the available entries in the current burst:
- Every doorbell a new burst begins
- With each write of an llq entry, the available entries in the
current burst are decreased by 1.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Mon, 3 Jun 2019 08:04:09 +0000 (08:04 +0000)]
net: dsa: mv88e6xxx: make mv88e6xxx_g1_stats_wait static
mv88e6xxx_g1_stats_wait has no users outside global1.c, so make it
static.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Mon, 3 Jun 2019 07:52:46 +0000 (07:52 +0000)]
net: dsa: mv88e6xxx: fix comments and macro names in mv88e6390_g1_mgmt_rsvd2cpu
The macros have an extraneous '800' (after 0180C2 there should be just
six nibbles, with X representing one), while the comments have
interchanged c2 and 80 and an extra :00.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 31 May 2019 18:44:09 +0000 (12:44 -0600)]
nexthop: Add entry to MAINTAINERS
Add entry to MAINTAINERS file for new nexthop code.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 01:15:38 +0000 (18:15 -0700)]
Merge branch 'r8169-replace-several-function-pointers-with-direct-calls'
Heiner Kallweit says:
====================
r8169: replace several function pointers with direct calls
This series removes most function pointers from struct rtl8169_private
and uses direct calls instead. This simplifies the code and avoids
the penalty of indirect calls in times of retpoline.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 31 May 2019 17:55:11 +0000 (19:55 +0200)]
r8169: avoid tso csum function indirection
Replace indirect call to tso_csum with direct calls. To do this we have
to move rtl_chip_supports_csum_v2().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 31 May 2019 17:54:03 +0000 (19:54 +0200)]
r8169: remove struct jumbo_ops
The jumbo_ops are used in just one place, so we can simplify the code
and avoid the penalty of indirect calls in times of retpoline.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 31 May 2019 17:53:28 +0000 (19:53 +0200)]
r8169: remove struct mdio_ops
The mdio_ops are used in just one place, so we can simplify the code
and avoid the penalty of indirect calls in times of retpoline.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 31 May 2019 17:17:15 +0000 (19:17 +0200)]
r8169: improve r8169_csum_workaround
Use helper skb_is_gso() and simplify access to tx_dropped.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 31 May 2019 17:14:44 +0000 (19:14 +0200)]
net: ethernet: improve eth_platform_get_mac_address
pci_device_to_OF_node(to_pci_dev(dev)) is the same as dev->of_node,
so we can simplify the code. In addition add an empty line before
the return statement.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Jun 2019 01:08:47 +0000 (18:08 -0700)]
Merge branch 'ifa_list-RCU'
Florian Westphal says:
====================
net: add rcu annotations for ifa_list
v3: fix typo in patch1 commit message
All other patches are unchanged.
v2: remove ifa_list iteration in afs instead of conversion
Eric Dumazet reported following problem:
It looks that unless RTNL is held, accessing ifa_list needs proper RCU
protection. indev->ifa_list can be changed under us by another cpu
(which owns RTNL) [..]
A proper rcu_dereference() with an happy sparse support would require
adding __rcu attribute.
This patch series does that: add __rcu to the ifa_list pointers.
That makes sparse complain, so the series also adds the required
rcu_assign_pointer/dereference helpers where needed.
All patches except the last one are preparation work.
Two new macros are introduced for in_ifaddr walks.
Last patch adds the __rcu annotations and the assign_pointer/dereference
helper calls.
This patch is a bit large, but I found no better way -- other
approaches (annotate-first or add helpers-first) all result in
mid-series sparse warnings.
This series is submitted vs. net-next rather than net for several
reasons:
1. Its (mostly) compile-tested only
2. 3rd patch changes behaviour wrt. secondary addresses
(see changelog)
3. The problem exists for a very long time (2004), so it doesn't
seem to be urgent to fix this -- rcu use to free ifa_list
predates the git era.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 31 May 2019 16:27:09 +0000 (18:27 +0200)]
net: ipv4: provide __rcu annotation for ifa_list
ifa_list is protected by rcu, yet code doesn't reflect this.
Add the __rcu annotations and fix up all places that are now reported by
sparse.
I've done this in the same commit to not add intermediate patches that
result in new warnings.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 31 May 2019 16:27:08 +0000 (18:27 +0200)]
drivers: use in_dev_for_each_ifa_rtnl/rcu
Like previous patches, use the new iterator macros to avoid sparse
warnings once proper __rcu annotations are added.
Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 31 May 2019 16:27:07 +0000 (18:27 +0200)]
net: use new in_dev_ifa iterators
Use in_dev_for_each_ifa_rcu/rtnl instead.
This prevents sparse warnings once proper __rcu annotations are added.
Signed-off-by: Florian Westphal <fw@strlen.de>
t di# Last commands done (6 commands done):
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 31 May 2019 16:27:06 +0000 (18:27 +0200)]
netfilter: use in_dev_for_each_ifa_rcu
Netfilter hooks are always running under rcu read lock, use
the new iterator macro so sparse won't complain once we add
proper __rcu annotations.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>