openwrt/staging/blogic.git
5 years agonet/mlx5e: Replace TC VLAN pop and push actions with VLAN modify
Eli Britstein [Thu, 21 Mar 2019 22:51:42 +0000 (15:51 -0700)]
net/mlx5e: Replace TC VLAN pop and push actions with VLAN modify

Changing the VLAN header may be implemented by pop the existing header
and push a new one. Translate those operations as VLAN modify.
Applicable for use cases such as OVS where the controller translates a
vlan modify meta (OF) rule to DP pop+push actions rule.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Support VLAN modify action
Eli Britstein [Thu, 21 Mar 2019 22:51:41 +0000 (15:51 -0700)]
net/mlx5e: Support VLAN modify action

Support VLAN modify action by emulating a rewrite action for the VLAN
fields. Currently, the only supported field is the vid. The prio in the
action must be set to 0 to indicate no change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Add VLAN ID rewrite fields
Eli Britstein [Thu, 21 Mar 2019 22:51:40 +0000 (15:51 -0700)]
net/mlx5e: Add VLAN ID rewrite fields

Add VLAN ID rewrite fields as a pre-step to support this rewrite.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet: Add IANA_VXLAN_UDP_PORT definition to vxlan header file
Moshe Shemesh [Thu, 21 Mar 2019 22:51:39 +0000 (15:51 -0700)]
net: Add IANA_VXLAN_UDP_PORT definition to vxlan header file

Added IANA_VXLAN_UDP_PORT (4789) definition to vxlan header file so it
can be used by drivers instead of local definition.
Updated drivers which locally defined it as 4789 to use it.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Peng Li <lipeng321@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: TX, Add geneve tunnel stateless offload support
Moshe Shemesh [Thu, 21 Mar 2019 22:51:38 +0000 (15:51 -0700)]
net/mlx5e: TX, Add geneve tunnel stateless offload support

Currently support only default geneve udp port (6081).
For the tx side, the HW is assisted by SW parsing, which sets the
headers offset to offload tunneled LSO and csum. Note that for udp
tunnels, we don't use special rx offloads, as rss on the outer headers
is enough, we support checksum complete and GRO takes care of
aggregation.

Geneve TSO BW and CPU load results (tested using iperf single tcp
stream).
In this patch we add TSO support over Geneve, so the "before" result
doesn't actually get to using the TSO HW offload even when turned on.
Tested on ConnectX-5, Intel(R) Xeon(R) CPU E5-2660 v2 @2.20GHz.

 __________________________________
| Before         | After           |
|________________|_________________|
| 12.6 Gbits/sec | 21.7 Gbits/sec  |
| 100% CPU load  | 61.5% CPU load  |
|________________|_________________|

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Take SW parser code to a separate function
Moshe Shemesh [Thu, 21 Mar 2019 22:51:37 +0000 (15:51 -0700)]
net/mlx5e: Take SW parser code to a separate function

Refactor mlx5e_ipsec_set_swp() code, split the part which sets the eseg
software parser (SWP) offsets and flags, so it can be used in a
downstream patch by other mlx5e functionality which needs to set eseg
SWP.
The new function mlx5e_set_eseg_swp() is useful for setting swp for both
outer and inner headers. It also handles the special ipsec case of xfrm
mode transfer.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet: Move the definition of the default Geneve udp port to public header file
Moshe Shemesh [Thu, 21 Mar 2019 22:51:36 +0000 (15:51 -0700)]
net: Move the definition of the default Geneve udp port to public header file

Move the definition of the default Geneve udp port from the geneve
source to the header file, so we can re-use it from drivers.
Modify existing drivers to use it.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Remove redundant assignment
Gustavo A. R. Silva [Thu, 21 Mar 2019 22:51:34 +0000 (15:51 -0700)]
net/mlx5e: Remove redundant assignment

Remove redundant assignment to tun_entropy->enabled.

Addesses-Coverity-ID: 1477328 ("Unused value")
Fixes: 97417f6182f8 ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix compilation warning in en_tc.c
Saeed Mahameed [Thu, 21 Mar 2019 22:51:33 +0000 (15:51 -0700)]
net/mlx5e: Fix compilation warning in en_tc.c

Amazingly a mlx5e_tc function is being called from the eswitch layer,
which is by itself very terrible! The function was declared locally in
eswitch_offloads.c so it could be used there, which caused the following
compilation warning, fix that.

drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes]
error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix port buffer function documentation format
Saeed Mahameed [Thu, 21 Mar 2019 22:51:32 +0000 (15:51 -0700)]
net/mlx5e: Fix port buffer function documentation format

This patch fixes compiler warnings:
In drivers/.../mlx5/core/en/port_buffer.c:190:
warning: Function parameter or member 'pfc_en' not described...
...
warning: Function parameter or member 'change' not described...

Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Fix compilation warning in eq.c
Saeed Mahameed [Thu, 21 Mar 2019 22:51:31 +0000 (15:51 -0700)]
net/mlx5: Fix compilation warning in eq.c

mlx5_eq_table_get_rmap is being used only when CONFIG_RFS_ACCEL is
enabled, this patch fixes the below warning when CONFIG_RFS_ACCEL is
disabled.

drivers/.../mlx5/core/eq.c:903:18: [-Werror=missing-prototypes]
error: no previous prototype for ‘mlx5_eq_table_get_rmap’

Fixes: f2f3df550139 ("net/mlx5: EQ, Privatize eq_table and friends")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Simplify mlx5_sriov_is_enabled() by using pci core API
Parav Pandit [Thu, 21 Mar 2019 22:51:30 +0000 (15:51 -0700)]
net/mlx5: Simplify mlx5_sriov_is_enabled() by using pci core API

It is desired to get rid of num_vfs stored inside mlx5_core_sriov to
safely support vports more than vfs.
To reduce dependency on mlx5_core_sriov num_vfs, start using
pci_num_vf() from pci core.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Rename total_vfs to total_vports
Parav Pandit [Thu, 21 Mar 2019 22:51:29 +0000 (15:51 -0700)]
net/mlx5: Rename total_vfs to total_vports

Macro MLX5_TOTAL_VPORTS() returns total number of vports. Therefore,
rename variable total_vfs to total_vports to improve code readability.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Simplify sriov enable/disable flow
Parav Pandit [Thu, 21 Mar 2019 22:51:28 +0000 (15:51 -0700)]
net/mlx5: Simplify sriov enable/disable flow

Simplify sriov enable/disable flow for below two checks.

1. PCI core driver allows sriov configuration only on a PF.
This is done in drivers/pci/pci-sysfs.c sriov_attrs_are_visible().

2. PCI core driver allow sriov enablement if the sriov is currently
disabled for for a PF. This is done in drivers/pci/pci-sysfs.c
sriov_numvfs_store().

Hence there is no need for mlx5 driver to duplicate such checks.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agogenetlink: make policy common to family
Johannes Berg [Thu, 21 Mar 2019 21:51:02 +0000 (22:51 +0100)]
genetlink: make policy common to family

Since maxattr is common, the policy can't really differ sanely,
so make it common as well.

The only user that did in fact manage to make a non-common policy
is taskstats, which has to be really careful about it (since it's
still using a common maxattr!). This is no longer supported, but
we can fake it using pre_doit.

This reduces the size of e.g. nl80211.o (which has lots of commands):

   text    data     bss     dec     hex filename
 398745   14323    2240  415308   6564c net/wireless/nl80211.o (before)
 397913   14331    2240  414484   65314 net/wireless/nl80211.o (after)
--------------------------------
   -832      +8       0    -824

Which is obviously just 8 bytes for each command, and an added 8
bytes for the new policy pointer. I'm not sure why the ops list is
counted as .text though.

Most of the code transformations were done using the following spatch:
    @ops@
    identifier OPS;
    expression POLICY;
    @@
    struct genl_ops OPS[] = {
    ...,
     {
    - .policy = POLICY,
     },
    ...
    };

    @@
    identifier ops.OPS;
    expression ops.POLICY;
    identifier fam;
    expression M;
    @@
    struct genl_family fam = {
            .ops = OPS,
            .maxattr = M,
    +       .policy = POLICY,
            ...
    };

This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
the cb->data as ops, which we want to change in a later genl patch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmit
Heiner Kallweit [Thu, 21 Mar 2019 20:41:48 +0000 (21:41 +0100)]
r8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmit

Replace the call to netif_wake_queue in rtl8169_start_xmit with
netif_start_queue as we don't need to actually wake up the queue since
we are still in mid transmit so we just need to reset the bit so it
doesn't prevent the next transmit.
(Description shamelessly copied from a mail sent by Alex.)

Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: add downshift support
Heiner Kallweit [Thu, 21 Mar 2019 20:08:35 +0000 (21:08 +0100)]
net: phy: aquantia: add downshift support

Aquantia PHY's of the AQR107 family support the downshift feature.
Add support for it as standard PHY tunable so that it can be controlled
via ethtool.
The AQCS109 supports a proprietary 2-pair 1Gbps mode. If two such PHY's
are connected to each other with a 2-pair cable, they may not be able
to establish a link if both advertise modes > 1Gbps.

v2:
- add downshift event detection
- warn if downshift occurred
- read downshifted rate from vendor register
- enable downshift per default on all AQR107 family members

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock'
David S. Miller [Thu, 21 Mar 2019 21:32:17 +0000 (14:32 -0700)]
Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock'

Vlad Buslov says:

====================
Refactor flower classifier to remove dependency on rtnl lock

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor
flower classifier to support unlocked execution and register it with
unlocked flag.

This patch set implements following changes to make flower classifier
concurrency-safe:

- Implement reference counting for individual filters. Change fl_get to
  take reference to filter. Implement tp->ops->put callback that was
  introduced in cls API patch set to release reference to flower filter.

- Use tp->lock spinlock to protect internal classifier data structures
  from concurrent modification.

- Handle concurrent tcf proto deletion by returning EAGAIN, which will
  cause cls API to retry and create new proto instance or return error
  to the user (depending on message type).

- Handle concurrent insertion of filter with same priority and handle by
  returning EAGAIN, which will cause cls API to lookup filter again and
  process it accordingly to netlink message flags.

- Extend flower mask with reference counting and protect masks list with
  masks_lock spinlock.

- Prevent concurrent mask insertion by inserting temporary value to
  masks hash table. This is necessary because mask initialization is a
  sleeping operation and cannot be done while holding tp->lock.

Both chain level and classifier level conflicts are resolved by
returning -EAGAIN to cls API that results restart of whole operation.
This retry mechanism is a result of fine-grained locking approach used
in this and previous changes in series and is necessary to allow
concurrent updates on same chain instance. Alternative approach would be
to lock the whole chain while updating filters on any of child tp's,
adding and removing classifier instances from the chain. However, since
most CPU-intensive parts of filter update code are specifically in
classifier code and its dependencies (extensions and hw offloads), such
approach would negate most of the gains introduced by this change and
previous changes in the series when updating same chain instance.

Tcf hw offloads API is not changed by this patch set and still requires
caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock
state by means of 'rtnl_held' flag provided by cls API and obtains the
lock before calling hw offloads. Following patch set will lift this
restriction and refactor cls hw offloads API to support unlocked
execution.

With these changes flower classifier is safely registered with
TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch.

Changes from V2 to V3:
- Rebase on latest net-next

Changes from V1 to V2:
- Extend cover letter with explanation about retry mechanism.
- Rebase on current net-next.
- Patch 1:
  - Use rcu_dereference_raw() for tp->root dereference.
  - Update comment in fl_head_dereference().
- Patch 2:
  - Remove redundant check in fl_change error handling code.
  - Add empty line between error check and new handle assignment.
- Patch 3:
  - Refactor loop in fl_get_next_filter() to improve readability.
- Patch 4:
  - Refactor __fl_delete() to improve readability.
- Patch 6:
  - Fix comment in fl_check_assign_mask().
- Patch 9:
  - Extend commit message.
  - Fix error code in comment.
- Patch 11:
  - Fix fl_hw_replace_filter() to always release rtnl lock in error
    handlers.
- Patch 12:
  - Don't take rtnl lock before calling __fl_destroy_filter() in
    workqueue context.
  - Extend commit message with explanation why flower still takes rtnl
    lock before calling hardware offloads API.

Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: set unlocked flag for flower proto ops
Vlad Buslov [Thu, 21 Mar 2019 13:17:44 +0000 (15:17 +0200)]
net: sched: flower: set unlocked flag for flower proto ops

Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
in fl_destroy_filter_work() that is executed on workqueue instead of being
called by cls API and is not affected by setting
TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
classifier before calling hardware offloads API that has not been updated
for unlocked execution.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: track rtnl lock state
Vlad Buslov [Thu, 21 Mar 2019 13:17:43 +0000 (15:17 +0200)]
net: sched: flower: track rtnl lock state

Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
to internal functions that need to know rtnl lock state. Take rtnl lock
before calling tcf APIs that require it (hw offload, bind filter, etc.).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: protect flower classifier state with spinlock
Vlad Buslov [Thu, 21 Mar 2019 13:17:42 +0000 (15:17 +0200)]
net: sched: flower: protect flower classifier state with spinlock

struct tcf_proto was extended with spinlock to be used by classifiers
instead of global rtnl lock. Use it to protect shared flower classifier
data structures (handle_idr, mask hashtable and list) and fields of
individual filters that can be accessed concurrently. This patch set uses
tcf_proto->lock as per instance lock that protects all filters on
tcf_proto.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: handle concurrent tcf proto deletion
Vlad Buslov [Thu, 21 Mar 2019 13:17:41 +0000 (15:17 +0200)]
net: sched: flower: handle concurrent tcf proto deletion

Without rtnl lock protection tcf proto can be deleted concurrently. Check
tcf proto 'deleting' flag after taking tcf spinlock to verify that no
concurrent deletion is in progress. Return EAGAIN error if concurrent
deletion detected, which will cause caller to retry and possibly create new
instance of tcf proto.

Retry mechanism is a result of fine-grained locking approach used in this
and previous changes in series and is necessary to allow concurrent updates
on same chain instance. Alternative approach would be to lock the whole
chain while updating filters on any of child tp's, adding and removing
classifier instances from the chain. However, since most CPU-intensive
parts of filter update code are specifically in classifier code and its
dependencies (extensions and hw offloads), such approach would negate most
of the gains introduced by this change and previous changes in the series
when updating same chain instance.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: handle concurrent filter insertion in fl_change
Vlad Buslov [Thu, 21 Mar 2019 13:17:40 +0000 (15:17 +0200)]
net: sched: flower: handle concurrent filter insertion in fl_change

Check if user specified a handle and another filter with the same handle
was inserted concurrently. Return EAGAIN to retry filter processing (in
case it is an overwrite request).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: protect masks list with spinlock
Vlad Buslov [Thu, 21 Mar 2019 13:17:39 +0000 (15:17 +0200)]
net: sched: flower: protect masks list with spinlock

Protect modifications of flower masks list with spinlock to remove
dependency on rtnl lock and allow concurrent access.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: handle concurrent mask insertion
Vlad Buslov [Thu, 21 Mar 2019 13:17:38 +0000 (15:17 +0200)]
net: sched: flower: handle concurrent mask insertion

Without rtnl lock protection masks with same key can be inserted
concurrently. Insert temporary mask with reference count zero to masks
hashtable. This will cause any concurrent modifications to retry.

Wait for rcu grace period to complete after removing temporary mask from
masks hashtable to accommodate concurrent readers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: add reference counter to flower mask
Vlad Buslov [Thu, 21 Mar 2019 13:17:37 +0000 (15:17 +0200)]
net: sched: flower: add reference counter to flower mask

Extend fl_flow_mask structure with reference counter to allow parallel
modification without relying on rtnl lock. Use rcu read lock to safely
lookup mask and increment reference counter in order to accommodate
concurrent deletes.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: track filter deletion with flag
Vlad Buslov [Thu, 21 Mar 2019 13:17:36 +0000 (15:17 +0200)]
net: sched: flower: track filter deletion with flag

In order to prevent double deletion of filter by concurrent tasks when rtnl
lock is not used for synchronization, add 'deleted' filter field. Check
value of this field when modifying filters and return error if concurrent
deletion is detected.

Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
and return error code as function return value instead. This is necessary
to signal concurrent filter delete to caller.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: introduce reference counting for filters
Vlad Buslov [Thu, 21 Mar 2019 13:17:35 +0000 (15:17 +0200)]
net: sched: flower: introduce reference counting for filters

Extend flower filters with reference counting in order to remove dependency
on rtnl lock in flower ops and allow to modify filters concurrently.
Reference to flower filter can be taken/released concurrently as soon as it
is marked as 'unlocked' by last patch in this series. Use atomic reference
counter type to make concurrent modifications safe.

Always take reference to flower filter while working with it:
- Modify fl_get() to take reference to filter.
- Implement tp->put() callback as fl_put() function to allow cls API to
release reference taken by fl_get().
- Modify fl_change() to assume that caller holds reference to fold and take
reference to fnew.
- Take reference to filter while using it in fl_walk().

Implement helper functions to get/put filter reference counter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: refactor fl_change
Vlad Buslov [Thu, 21 Mar 2019 13:17:34 +0000 (15:17 +0200)]
net: sched: flower: refactor fl_change

As a preparation for using classifier spinlock instead of relying on
external rtnl lock, rearrange code in fl_change. The goal is to group the
code which changes classifier state in single block in order to allow
following commits in this set to protect it from parallel modification with
tp->lock. Data structures that require tp->lock protection are mask
hashtable and filters list, and classifier handle_idr.

fl_hw_replace_filter() is a sleeping function and cannot be called while
holding a spinlock. In order to execute all sequence of changes to shared
classifier data structures atomically, call fl_hw_replace_filter() before
modifying them.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: flower: don't check for rtnl on head dereference
Vlad Buslov [Thu, 21 Mar 2019 13:17:33 +0000 (15:17 +0200)]
net: sched: flower: don't check for rtnl on head dereference

Flower classifier only changes root pointer during init and destroy. Cls
API implements reference counting for tcf_proto, so there is no danger of
concurrent access to tp when it is being destroyed, even without protection
provided by rtnl lock.

Implement new function fl_head_dereference() to dereference tp->root
without checking for rtnl lock. Use it in all flower function that obtain
head pointer instead of rtnl_dereference().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: remove defines for unused control bits
Jakub Kicinski [Thu, 21 Mar 2019 04:01:53 +0000 (21:01 -0700)]
nfp: remove defines for unused control bits

NFP driver ABI contains bits for L2 switching which were never
implemented in initially envisioned form.

Remove the defines, and open up the possibility of
reclaiming the bits for other uses.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'rhashtable-cleanups'
David S. Miller [Thu, 21 Mar 2019 21:01:10 +0000 (14:01 -0700)]
Merge branch 'rhashtable-cleanups'

NeilBrown says:

====================
Two clean-ups for rhashtable.

These two patches make small improvements to
rhashtable, but are otherwise unrelated.

Thanks to Herbert, Miguel, and Paul for the review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: rename rht_for_each*continue as *from.
NeilBrown [Thu, 21 Mar 2019 03:42:40 +0000 (14:42 +1100)]
rhashtable: rename rht_for_each*continue as *from.

The pattern set by list.h is that for_each..continue()
iterators start at the next entry after the given one,
while for_each..from() iterators start at the given
entry.

The rht_for_each*continue() iterators are documented as though the
start at the 'next' entry, but actually start at the given entry,
and they are used expecting that behaviour.
So fix the documentation and change the names to *from for consistency
with list.h

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: don't hold lock on first table throughout insertion.
NeilBrown [Thu, 21 Mar 2019 03:42:40 +0000 (14:42 +1100)]
rhashtable: don't hold lock on first table throughout insertion.

rhashtable_try_insert() currently holds a lock on the bucket in
the first table, while also locking buckets in subsequent tables.
This is unnecessary and looks like a hold-over from some earlier
version of the implementation.

As insert and remove always lock a bucket in each table in turn, and
as insert only inserts in the final table, there cannot be any races
that are not covered by simply locking a bucket in each table in turn.

When an insert call reaches that last table it can be sure that there
is no matchinf entry in any other table as it has searched them all, and
insertion never happens anywhere but in the last table.  The fact that
code tests for the existence of future_tbl while holding a lock on
the relevant bucket ensures that two threads inserting the same key
will make compatible decisions about which is the "last" table.

This simplifies the code and allows the ->rehash field to be
discarded.

We still need a way to ensure that a dead bucket_table is never
re-linked by rhashtable_walk_stop().  This can be achieved by calling
call_rcu() inside the locked region, and checking with
rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
bucket table is empty and dead.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver'
David S. Miller [Thu, 21 Mar 2019 20:41:26 +0000 (13:41 -0700)]
Merge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver'

Florian Fainelli says:

====================
net: phy: Move Omega PHY entry to Cygnus PHY driver

In order to pave the way for adding some specific Omega PHY features
that may not be desirable on other products covered by the bcm7xxx PHY
driver, split the Omega PHY entry into the Cygnus PHY driver such that
the PHY drivers are reflective of product lines/business units
maintaining them within Broadcom.

No functional changes intended.
====================

Acked-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Move Omega PHY entry to Cygnus PHY driver
Florian Fainelli [Wed, 20 Mar 2019 19:53:13 +0000 (12:53 -0700)]
net: phy: Move Omega PHY entry to Cygnus PHY driver

Cygnus and Omega are part of the same business unit and product line, it
makes sense to group PHY entries by products such that a platform can
select only the drivers that it needs. Bring all the functionality that
the BCM7XXX_28NM_GPHY() macro hides for us and remove the Omega PHY
entry from bcm7xxx.c.

As an added bonus, we now have a proper mdio_device_id entry to permit
auto-loading.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Prepare for moving Omega out of bcm7xxx
Florian Fainelli [Wed, 20 Mar 2019 19:53:12 +0000 (12:53 -0700)]
net: phy: Prepare for moving Omega out of bcm7xxx

The Omega PHY entry was added to bcm7xxx.c out of convenience and this
breaks the one driver per product line paradigm that was applied up
until now. Since the AFE initialization is shared between Omega and
BCM7xxx move the relevant functions to bcm-phy-lib.[ch]. No functional
changes introduced.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dst: remove gc leftovers
Julian Wiedmann [Wed, 20 Mar 2019 19:02:56 +0000 (20:02 +0100)]
net: dst: remove gc leftovers

Get rid of some obsolete gc-related documentation and macros that were
missed in commit 5b7c9a8ff828 ("net: remove dst gc related code").

CC: Wei Wang <weiwan@google.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-broadcom-Remove-print-of-base-address'
David S. Miller [Thu, 21 Mar 2019 20:32:35 +0000 (13:32 -0700)]
Merge branch 'net-broadcom-Remove-print-of-base-address'

Florian Fainelli says:

====================
net: broadcom: Remove print of base address

Some broadcom MDIO/switch/Ethernet MAC drivers insist on printing the
base register virtual address which has little value.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: systemport: Remove print of base address
Florian Fainelli [Wed, 20 Mar 2019 16:45:17 +0000 (09:45 -0700)]
net: systemport: Remove print of base address

Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, especially given we use a dev_info() which
already displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: bcm_sf2: Remove print of base address
Florian Fainelli [Wed, 20 Mar 2019 16:45:16 +0000 (09:45 -0700)]
net: dsa: bcm_sf2: Remove print of base address

Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, we use a dev_info() print which already
displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: mdio-bcm-unimac: Remove print of base address
Florian Fainelli [Wed, 20 Mar 2019 16:45:15 +0000 (09:45 -0700)]
net: phy: mdio-bcm-unimac: Remove print of base address

Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful, especially given we use a dev_info() which
already displays the platform device's address.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Remove fallback argument from ip6_hold_safe
David Ahern [Wed, 20 Mar 2019 16:24:50 +0000 (09:24 -0700)]
ipv6: Remove fallback argument from ip6_hold_safe

net and null_fallback are redundant. Remove null_fallback in favor of
!net check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: Allow amount of dirty memory from fib resizing to be controllable
David Ahern [Wed, 20 Mar 2019 16:18:59 +0000 (09:18 -0700)]
ipv4: Allow amount of dirty memory from fib resizing to be controllable

fib_trie implementation calls synchronize_rcu when a certain amount of
pages are dirty from freed entries. The number of pages was determined
experimentally in 2009 (commit c3059477fce2d).

At the current setting, synchronize_rcu is called often -- 51 times in a
second in one test with an average of an 8 msec delay adding a fib entry.
The total impact is a lot of slow down modifying the fib. This is seen
in the output of 'time' - the difference between real time and sys+user.
For example, using 720,022 single path routes and 'ip -batch'[1]:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m14.214s
    user    0m2.513s
    sys     0m6.783s

So roughly 35% of the actual time to install the routes is from the ip
command getting scheduled out, most notably due to synchronize_rcu (this
is observed using 'perf sched timehist').

This patch makes the amount of dirty memory configurable between 64k where
the synchronize_rcu is called often (small, low end systems that are memory
sensitive) to 64M where synchronize_rcu is called rarely during a large
FIB change (for high end systems with lots of memory). The default is 512kB
which corresponds to the current setting of 128 pages with a 4kB page size.

As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
in a second blocking for up to 30 msec in a single instance, and a total
of almost 100 msec across the 4 calls in the second. The trade off is
allowing FIB entries to consume more memory in a given time window but
but with much better fib insertion rates (~30% increase in prefixes/sec).
With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
file runs in:

    $ time ./ip -batch ipv4/routes-1-hops
    real    0m9.692s
    user    0m2.491s
    sys     0m6.769s

So the dead time is reduced to about 1/2 second or <5% of the real time.

[1] 'ip' modified to not request ACK messages which improves route
    insertion times by about 20%

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: Remove unused first parameter of tun_get_iff()
Kirill Tkhai [Wed, 20 Mar 2019 09:16:53 +0000 (12:16 +0300)]
tun: Remove unused first parameter of tun_get_iff()

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device
Kirill Tkhai [Wed, 20 Mar 2019 09:16:42 +0000 (12:16 +0300)]
tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device

In commit f2780d6d7475 "tun: Add ioctl() SIOCGSKNS cmd to allow
obtaining net ns of tun device" it was missed that tun may change
its net ns, while net ns of socket remains the same as it was
created initially. SIOCGSKNS returns net ns of socket, so it is
not suitable for obtaining net ns of device.

We may have two tun devices with the same names in two net ns,
and in this case it's not possible to determ, which of them
fd refers to (TUNGETIFF will return the same name).

This patch adds new ioctl() cmd for obtaining net ns of a device.

Reported-by: Harald Albrecht <harald.albrecht@gmx.net>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ipv6-Change-addrconf_f6i_alloc-to-use-ip6_route_info_create'
David S. Miller [Thu, 21 Mar 2019 17:16:54 +0000 (10:16 -0700)]
Merge branch 'ipv6-Change-addrconf_f6i_alloc-to-use-ip6_route_info_create'

David Ahern says:

====================
ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create

addrconf_f6i_alloc is the last caller of fib6_info_alloc besides
ip6_route_info_create. There really is no good reason for it do
its own fib6_info initialization, so convert it to call
ip6_route_info_create.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Change addrconf_f6i_alloc to use ip6_route_info_create
David Ahern [Thu, 21 Mar 2019 12:21:35 +0000 (05:21 -0700)]
ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create

Change addrconf_f6i_alloc to generate a fib6_config and call
ip6_route_info_create. addrconf_f6i_alloc is the last caller to
fib6_info_alloc besides ip6_route_info_create, and there is no
reason for it to do its own initialization on a fib6_info.

Host routes need to be created even if the device is down, so add a
new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init
to not error out if device is not up.

Notes on the conversion:
- ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL
  and fc_mx_len set to 0
- dst_nocount is handled by the RTF_ADDRCONF flag
- dst_host is handled by fc_dst_len = 128

nh_gw does not get set after the conversion to ip6_route_info_create
but it should not be set in addrconf_f6i_alloc since this is a host
route not a gateway route.

Everything else is a straight forward map between fib6_info and
fib6_config.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Move setting default metric for routes
David Ahern [Thu, 21 Mar 2019 12:21:34 +0000 (05:21 -0700)]
ipv6: Move setting default metric for routes

ip6_route_info_create is a low level function for ensuring fc_metric is
set. Move the check and default setting to the 2 locations that do not
already set fc_metric before calling ip6_route_info_create. This is
required for the next patch which moves addrconf allocations to
ip6_route_info_create and want the metric for host routes to be 0.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: Replace kfree_skb() with consume_skb()
Vakul Garg [Thu, 21 Mar 2019 11:59:57 +0000 (11:59 +0000)]
net/tls: Replace kfree_skb() with consume_skb()

To free the skb in normal course of processing, consume_skb() should be
used. Only for failure paths, skb_free() is intended to be used.

https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.html

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix a null pointer deref
Hoang Le [Thu, 21 Mar 2019 10:25:18 +0000 (17:25 +0700)]
tipc: fix a null pointer deref

In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast") we introduced new method to eliminate the risk of message
reordering that happen in between different nodes.
Unfortunately, we forgot checking at receiving side to ignore intra node.

We fix this by checking and returning if arrived message from intra node.

syzbot report:

==================================================================
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7820 Comm: syz-executor418 Not tainted 5.0.0+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
RIP: 0010:tipc_mcast_filter_msg+0x21b/0x13d0 net/tipc/bcast.c:782
Code: 45 c0 0f 84 39 06 00 00 48 89 5d 98 e8 ce ab a5 fa 49 8d bc
 24 c8 00 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03
 <80> 3c 08 00 0f 85 9a 0e 00 00 49 8b 9c 24 c8 00 00 00 48 be 00 00
RSP: 0018:ffff8880959defc8 EFLAGS: 00010202
RAX: 0000000000000019 RBX: ffff888081258a48 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff86cab862 RDI: 00000000000000c8
RBP: ffff8880959df030 R08: ffff8880813d0200 R09: ffffed1015d05bc8
R10: ffffed1015d05bc7 R11: ffff8880ae82de3b R12: 0000000000000000
R13: 000000000000002c R14: 0000000000000000 R15: ffff888081258a48
FS:  000000000106a880(0000) GS:ffff8880ae800000(0000)
 knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001cc0 CR3: 0000000094a20000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tipc_sk_filter_rcv+0x182d/0x34f0 net/tipc/socket.c:2168
 tipc_sk_enqueue net/tipc/socket.c:2254 [inline]
 tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305
 tipc_sk_mcast_rcv+0x724/0x1020 net/tipc/socket.c:1209
 tipc_mcast_xmit+0x7fe/0x1200 net/tipc/bcast.c:410
 tipc_sendmcast+0xb36/0xfc0 net/tipc/socket.c:820
 __tipc_sendmsg+0x10df/0x18d0 net/tipc/socket.c:1358
 tipc_sendmsg+0x53/0x80 net/tipc/socket.c:1291
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:661
 ___sys_sendmsg+0x806/0x930 net/socket.c:2260
 __sys_sendmsg+0x105/0x1d0 net/socket.c:2298
 __do_sys_sendmsg net/socket.c:2307 [inline]
 __se_sys_sendmsg net/socket.c:2305 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2305
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4401c9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8
 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd887fa9d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401c9
RDX: 0000000000000000 RSI: 0000000020002140 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a50
R13: 0000000000401ae0 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace ba79875754e1708f ]---

Reported-by: syzbot+be4bdf2cc3e85e952c50@syzkaller.appspotmail.com
Fixes: c55c8eda ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix use-after-free in tipc_sk_filter_rcv
Hoang Le [Thu, 21 Mar 2019 10:25:17 +0000 (17:25 +0700)]
tipc: fix use-after-free in tipc_sk_filter_rcv

skb free-ed in:
  1/ condition 1: tipc_sk_filter_rcv -> tipc_sk_proto_rcv
  2/ condition 2: tipc_sk_filter_rcv -> tipc_group_filter_msg
This leads to a "use-after-free" access in the next condition.

We fix this by intializing the variable at declaration, then it is safe
to check this variable to continue processing if condition matches.

syzbot report:

==================================================================
BUG: KASAN: use-after-free in tipc_sk_filter_rcv+0x2166/0x34f0
 net/tipc/socket.c:2167
Read of size 4 at addr ffff88808ea58534 by task kworker/u4:0/7

CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.0.0+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine,
 BIOS Google 01/01/2011
Workqueue: tipc_send tipc_conn_send_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
 tipc_sk_filter_rcv+0x2166/0x34f0 net/tipc/socket.c:2167
 tipc_sk_enqueue net/tipc/socket.c:2254 [inline]
 tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305
 tipc_topsrv_kern_evt+0x3b7/0x580 net/tipc/topsrv.c:610
 tipc_conn_send_to_sock+0x43e/0x5f0 net/tipc/topsrv.c:283
 tipc_conn_send_work+0x65/0x80 net/tipc/topsrv.c:303
 process_one_work+0x98e/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x357/0x430 kernel/kthread.c:253
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Reported-by: syzbot+e863893591cc7a622e40@syzkaller.appspotmail.com
Fixes: c55c8eda ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Add icmp_echo_ignore_anycast for ICMPv6
Stephen Suryaputra [Wed, 20 Mar 2019 14:29:27 +0000 (10:29 -0400)]
ipv6: Add icmp_echo_ignore_anycast for ICMPv6

In addition to icmp_echo_ignore_multicast, there is a need to also
prevent responding to pings to anycast addresses for security.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: isdn: Make isdn_ppp_mp_discard and isdn_ppp_mp_reassembly static
YueHaibing [Wed, 20 Mar 2019 13:48:06 +0000 (21:48 +0800)]
net: isdn: Make isdn_ppp_mp_discard and isdn_ppp_mp_reassembly static

Fix sparse warnings:

drivers/isdn/i4l/isdn_ppp.c:1891:16: warning:
 symbol 'isdn_ppp_mp_discard' was not declared. Should it be static?
drivers/isdn/i4l/isdn_ppp.c:1903:6: warning:
 symbol 'isdn_ppp_mp_reassembly' was not declared. Should it be static?

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Make hclge_destroy_cmd_queue static
YueHaibing [Wed, 20 Mar 2019 13:37:13 +0000 (21:37 +0800)]
net: hns3: Make hclge_destroy_cmd_queue static

Fix sparse warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:414:6:
 warning: symbol 'hclge_destroy_cmd_queue' was not declared. Should it be static?

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-refactor-ndo_select_queue'
David S. Miller [Wed, 20 Mar 2019 18:18:55 +0000 (11:18 -0700)]
Merge branch 'net-refactor-ndo_select_queue'

Paolo Abeni says:

====================
net: refactor ndo_select_queue()

Currently, on most devices implementing ndo_select_queue(), we get 2
indirect calls per xmit packet, at least in some scenarios.

We can avoid one of such indirect calls refactoring the ndo_select_queue()
usage so that we don't need anymore the 'fallback' argument.

The first patch renames a helper used later as a public API, the second one
changes the af packet implementation so that it uses the common infrastructure
to select the xmit queue, and the second patch drops the now unneeded argument
from ndo_select_queue().

Alternatively we could use the INDIRECT_CALL_WRAPPER infrastructure to avoid
the fallback indirect call in the common case, but this solution allows also
for some code cleanup.

 v1 -> v2:
  - renamed select queue helpers, as per Eric's and David's suggestions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: remove 'fallback' argument from dev->ndo_select_queue()
Paolo Abeni [Wed, 20 Mar 2019 10:02:06 +0000 (11:02 +0100)]
net: remove 'fallback' argument from dev->ndo_select_queue()

After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads vanilla  patched
(kpps) (kpps)
1 2334 2428
2 4166 4278
4 7895 8100

 v1 -> v2:
 - rebased after helper's name change

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agopacket: rework packet_pick_tx_queue() to use common code selection
Paolo Abeni [Wed, 20 Mar 2019 10:02:05 +0000 (11:02 +0100)]
packet: rework packet_pick_tx_queue() to use common code selection

Currently packet_pick_tx_queue() is the only caller of
ndo_select_queue() using a fallback argument other than
netdev_pick_tx.

Leveraging rx queue, we can obtain a similar queue selection
behavior using core helpers. After this change, ndo_select_queue()
is always invoked with netdev_pick_tx() as fallback.
We can change ndo_select_queue() signature in a followup patch,
dropping an indirect call per transmitted packet in some scenarios
(e.g. TCP syn and XDP generic xmit)

This changes slightly how af packet queue selection happens when
PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
tacking in account both XPS and TC mapping.

 v1  -> v2:
  - rebased after helper name change
 RFC -> v1:
  - initialize sender_cpu to the expected value

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dev: rename queue selection helpers.
Paolo Abeni [Wed, 20 Mar 2019 10:02:04 +0000 (11:02 +0100)]
net: dev: rename queue selection helpers.

With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.

Suggested-by: Eric Dumazet <edumazet@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-next'
David S. Miller [Wed, 20 Mar 2019 18:12:50 +0000 (11:12 -0700)]
Merge branch 'qed-next'

Sudarsana Reddy Kalluru says:

====================
qed* enhancements.

The patch series adds couple of enhancements for qed/qede drivers.
Please consider applying it to 'net-next' tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Define new MF bit for no_vlan config
Sudarsana Reddy Kalluru [Wed, 20 Mar 2019 07:26:26 +0000 (00:26 -0700)]
qed: Define new MF bit for no_vlan config

The patch introduces a new Multi-Function bit for cases where firmware
shouldn't perform the insertion of vlan-0 tag. The new bit is defined to
abstract the implementation from the actual MF mode.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: Populate mbi version in ethtool driver query data.
Sudarsana Reddy Kalluru [Wed, 20 Mar 2019 07:26:25 +0000 (00:26 -0700)]
qede: Populate mbi version in ethtool driver query data.

The patch adds support to display MBI image version in 'ethtool -i' output.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomacvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real device
Hangbin Liu [Wed, 20 Mar 2019 02:23:33 +0000 (10:23 +0800)]
macvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real device

Similiar to commit a6111d3c93d0 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to
real device") and commit 37dd9255b2f6 ("vlan: Pass ethtool get_ts_info
queries to real device."), add MACVlan HW ptp support.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: use eth_broadcast_addr() to assign broadcast address
Mao Wenan [Wed, 20 Mar 2019 02:06:57 +0000 (10:06 +0800)]
net: bridge: use eth_broadcast_addr() to assign broadcast address

This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: Add support of AES128-CCM based ciphers
Vakul Garg [Wed, 20 Mar 2019 02:03:36 +0000 (02:03 +0000)]
net/tls: Add support of AES128-CCM based ciphers

Added support for AES128-CCM based record encryption. AES128-CCM is
similar to AES128-GCM. Both of them have same salt/iv/mac size. The
notable difference between the two is that while invoking AES128-CCM
operation, the salt||nonce (which is passed as IV) has to be prefixed
with a hardcoded value '2'. Further, CCM implementation in kernel
requires IV passed in crypto_aead_request() to be full '16' bytes.
Therefore, the record structure 'struct tls_rec' has been modified to
reserve '16' bytes for IV. This works for both GCM and CCM based cipher.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-aquantia-add-interface-mode-handling'
David S. Miller [Wed, 20 Mar 2019 17:58:17 +0000 (10:58 -0700)]
Merge branch 'net-phy-aquantia-add-interface-mode-handling'

Heiner Kallweit says:

====================
net: phy: aquantia: add interface mode handling

These two patches add interface mode handling for the AQR107/AQCS109.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: check for changed interface mode in read_status
Nikita Yushchenko [Tue, 19 Mar 2019 22:05:50 +0000 (23:05 +0100)]
net: phy: aquantia: check for changed interface mode in read_status

Depending on the auto-negotiated speed the PHY may change the interface
mode. Check for new mode and set phydev->interface accordingly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: picked from bigger patch and reworked]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: check for supported interface modes in config_init
Andrew Lunn [Tue, 19 Mar 2019 22:04:38 +0000 (23:04 +0100)]
net: phy: aquantia: check for supported interface modes in config_init

Let config_init check for unsupported interface modes on AQR107/AQCS109.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: adjusted for AQR107/AQCS109 specifics]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve handling link_change_notify callback
Heiner Kallweit [Tue, 19 Mar 2019 18:56:51 +0000 (19:56 +0100)]
net: phy: improve handling link_change_notify callback

Currently the Phy driver's link_change_notify callback is called
whenever the state machine is run (every second if polling), no matter
whether the state changed or not. This isn't needed and may confuse
users considering the name of the callback. Actually it contradicts
its kernel-doc description. Therefore let's change the behavior and
call this callback only in case of an actual state change.

This requires changes to the at803x and rockchip drivers.
at803x can be simplified so that it reacts on a state change to
PHY_NOLINK only.
The rockchip driver can also be much simplified. We simply re-init
the AFE/DSP registers whenever we change to PHY_RUNNING and speed
is 100Mbps. This causes very small overhead because we do this even
if the speed was 100Mbps already. But this is negligible and
I think justified by the much simpler code.

Changes are compile-tested only.

A little bit problematic seems to be to find somebody with the
hardware to test the changes to the two PHY drivers. See also [0].
David may be able to test the Rockchip driver.

[0] https://marc.info/?t=153782508800006&r=1&w=2

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 20 Mar 2019 17:14:10 +0000 (10:14 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-03-19

This series contains updates to ice driver only.

Michal adds support for the pruning enable flag to avoid seeing
broadcast packets on different VLANs.

Akeem fixes an issue with VF queues being disabled and the VF netdev
network carrier being lost after reset. Fixed an issue issue when doing
PFR and CORER resets, where all VF VSIs need to be reset and rebuilt
with the main VSIs before replaying all VSIs.  Resolved an issue to
properly initialize VFs in the guest OS via PCI passthrough.

Bruce adds a local variable to avoid unnecessary de-references
throughout ice_probe().

Brett cleans up the code a bit by removing the need for a local variable
and re-designs the loop to simply return when get a successful result.
Cleans up the code to replace loop calls with a predefined macro to make
the code more consistent.  Updated the driver to ensure ITR granularity
is always 2 usecs. Refactors the calculation of VSIs per PF into a
general function that can calculate per PF allocations for not just VSIs
but across multiple resource types.  Improve the driver performance of
the driver when using the default settings by determining the ring size
and the number of descriptors for transmit and receive based on a
calculation with the PAGE_SIZE, ICE_MAX_NUM_DESC, and
ICE_REQ_DESC_MULTIPLE.

Chinh fixes an issue, where a reserved bit was possibly being set when
it should never be set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 20 Mar 2019 17:12:03 +0000 (10:12 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-03-19

This series contains updates to e100, e1000, e1000e, igb, igc and
ixgbe.

Serhey Popovych fixes the return value for several of our older
drivers for netdev_update_features() to notify of changes applied.

Kai-Heng Feng fixes the WoL setting for system suspend, which should
not set to runtime suspend settings for igb.  Then fixes a power
management issue with e1000e for CNP+ devices.

Colin Ian King fixes whitespace issue (indentation), which helps with
readability.

Sasha provides the remaining changes for igc, including the enabling of
multi-queues to receive.  Added support for displaying and configuring
network flow classification (NFC) via ethtool.  Added additional
statistics and basic counters for igc.  Fixed a typo, so it aligns with
our other drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: Determine descriptor count and ring size based on PAGE_SIZE
Brett Creeley [Fri, 8 Feb 2019 20:50:59 +0000 (12:50 -0800)]
ice: Determine descriptor count and ring size based on PAGE_SIZE

Currently we set the default number of Tx and Rx descriptors to 128 by
default. For Rx this amounts to a full page (assuming 4K pages) because
each Rx descriptor is 32 Bytes, but for Tx it only amounts to a half
page because each Tx descriptor is 16 Bytes (assuming 4K pages).
Instead of assuming 4K pages, determine the ring size and the number of
descriptors for Tx and Rx based on a calculation using the PAGE_SIZE,
ICE_MAX_NUM_DESC, and ICE_REQ_DESC_MULTIPLE. This change is being made
to improve the performance of the driver when using the default
settings.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Reset all VFs with VFLR during SR-IOV init flow
Akeem G Abodunrin [Fri, 8 Feb 2019 20:50:58 +0000 (12:50 -0800)]
ice: Reset all VFs with VFLR during SR-IOV init flow

During SR-IOV initialization, we allocate and setup VFs with reset, and
since we were going to inform Firmware about our intention to do VFLR by
disabling LAN TX Queue, then we really have to complete VF reset flow with
VFLR using appropriate registers - Otherwise, reset status bit for VF in
the Guest OS might returns DEADBEEF.
This resolves issue to properly initialize VFs in the Guest OS via PCI
passthrough.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Get resources per function
Brett Creeley [Fri, 8 Feb 2019 20:50:57 +0000 (12:50 -0800)]
ice: Get resources per function

ice_get_guar_num_vsi currently calculates the number of VSIs per PF.
Rework this into a general function ice_get_num_per_func, that can
calculate per PF allocations for not just VSIs but across multiple
resource types.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Implement flow to reset VFs with PFR and other resets
Akeem G Abodunrin [Fri, 8 Feb 2019 20:50:56 +0000 (12:50 -0800)]
ice: Implement flow to reset VFs with PFR and other resets

All VF VSIs need to be reset and rebuild with the main VSIs before
replaying all VSIs, so that all existing switch filters, scheduler tree
and other configuration could be replayed at once. This fixes issues when
doing PFR and CORER reset.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: configure GLINT_ITR to always have an ITR gran of 2
Brett Creeley [Fri, 8 Feb 2019 20:50:55 +0000 (12:50 -0800)]
ice: configure GLINT_ITR to always have an ITR gran of 2

Instead of hoping that our ITR granularity will be 2 usec program the
GLINT_CTL register to make sure the ITR granularity is always 2 usecs.

Now that we know what the ITR granularity will be get rid of the check
in ice_probe() to verify our previous assumption.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: use ice_for_each_vsi macro when possible
Brett Creeley [Fri, 8 Feb 2019 20:50:54 +0000 (12:50 -0800)]
ice: use ice_for_each_vsi macro when possible

Replace all instances of:
for (i = 0; i < pf->num_alloc_vsi; i++)

with the following macro:
ice_for_each_vsi(pf, i)

This will allow the code to be consistent since there are currently
cases of using both.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice : Ensure only valid bits are set in ice_aq_set_phy_cfg
Chinh T Cao [Fri, 8 Feb 2019 20:50:52 +0000 (12:50 -0800)]
ice : Ensure only valid bits are set in ice_aq_set_phy_cfg

In the ice_aq_set_phy_cfg AQ command, the 16.4 bit is reserved. This
patch will make sure that this bit will never be set to 1.

Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: remove redundant variable and if condition
Brett Creeley [Fri, 8 Feb 2019 20:50:51 +0000 (12:50 -0800)]
ice: remove redundant variable and if condition

In ice_pf_rxq_wait we are using an unnecessary local variable and also
we are checking if the timeout time was reached after the loop. Get rid
of the local variable and return 0 right when we get a successful
result. This makes it so we can return -ETIMEDOUT if we ever exit the
loop because we know the timeout time has been hit.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: avoid multiple unnecessary de-references in probe
Bruce Allan [Fri, 8 Feb 2019 20:50:50 +0000 (12:50 -0800)]
ice: avoid multiple unnecessary de-references in probe

Add a local variable struct device *dev to avoid unnecessary de-references
throughout ice_probe().

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix issue with VF reset and multiple VFs support on PFs
Akeem G Abodunrin [Fri, 8 Feb 2019 20:50:49 +0000 (12:50 -0800)]
ice: Fix issue with VF reset and multiple VFs support on PFs

This patch fixes issues with VF queues being disabled, and VF netdev
network carrier being lost after reset. Basically, we need to check if VF
is enabled, and queue configured in reset_all_vfs flow, and disable/enable
those queues appropriately whenever the function is called after
Global/CORER/PFR reset/rebuild/replay.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix broadcast traffic in port VLAN mode
Michal Swiatkowski [Fri, 8 Feb 2019 20:50:48 +0000 (12:50 -0800)]
ice: Fix broadcast traffic in port VLAN mode

Set egress (Rx) pruning enable flag for VF VSI in VSI ctxt to
enable prune action.

To avoid seeing broadcast packet in different VLAN, pruning enable
flag in VSI ctxt should be set.

Write new functions (fill VSI ctx) to not repeat send ctxt code.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Remove unneeded hw_dbg prints
Sasha Neftin [Fri, 15 Mar 2019 15:12:07 +0000 (17:12 +0200)]
igc: Remove unneeded hw_dbg prints

Remove unneeded hw_dbg prints from igc_ethtool.c file.
Clean up code.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Fix the typo in igc_base.h header definition
Sasha Neftin [Mon, 11 Mar 2019 22:34:35 +0000 (00:34 +0200)]
igc: Fix the typo in igc_base.h header definition

Add the underline for the _IGC_BASE_H_.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Add support for the ntuple feature
Sasha Neftin [Wed, 20 Feb 2019 12:39:31 +0000 (14:39 +0200)]
igc: Add support for the ntuple feature

Copy the ntuple feature into list of user selectable features.
Enable the ntuple feature.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Add support for statistics
Sasha Neftin [Mon, 18 Feb 2019 08:37:31 +0000 (10:37 +0200)]
igc: Add support for statistics

Add support for statistics and show basic counters.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'enc28j60-messaging-clean-up-and-ACPI-improvements'
David S. Miller [Tue, 19 Mar 2019 21:59:32 +0000 (14:59 -0700)]
Merge branch 'enc28j60-messaging-clean-up-and-ACPI-improvements'

Andy Shevchenko says:

====================
enc28j60: messaging clean up and ACPI improvements

Most of the patches in the series dedicated to update messaging to use modern
APIs, such as netdev, with a benefit to distinguish devices, if more than one
installed on the system.

Besides that, patch 1 targeting ACPI enabled systems when MAC address provided
there via properties.

And few clean ups are included, such as:
- switching to module_spi_driver()
- converting to use ether_addr_copy() API
- converting to use SPDX

Since v2:
- cover letter is added
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Convert to use SPDX identifier
Andy Shevchenko [Tue, 19 Mar 2019 18:49:30 +0000 (20:49 +0200)]
enc28j60: Convert to use SPDX identifier

Reduce size of duplicated comments by switching to use SPDX identifier.

No functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Fix indentation splats
Andy Shevchenko [Tue, 19 Mar 2019 18:49:29 +0000 (20:49 +0200)]
enc28j60: Fix indentation splats

Fix few indentation splats. No functional change intended.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Amend comments by fixing typos, adding periods, etc
Andy Shevchenko [Tue, 19 Mar 2019 18:49:28 +0000 (20:49 +0200)]
enc28j60: Amend comments by fixing typos, adding periods, etc

Amend comments in the code:
 - adding periods to the multi-line comments
 - fixing typos
 - capitalize first word in the sentences
 - etc

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Remove linux/init.h
Andy Shevchenko [Tue, 19 Mar 2019 18:49:27 +0000 (20:49 +0200)]
enc28j60: Remove linux/init.h

There is no need to include linux/init.h when at the same time
we include linux/module.h.

Remove redundant inclusion.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Convert printk() to netdev_printk()
Andy Shevchenko [Tue, 19 Mar 2019 18:49:26 +0000 (20:49 +0200)]
enc28j60: Convert printk() to netdev_printk()

The debug prints of network operations will look better if network
device name is printed. The benefit of that is a possibility to distinguish
the actual hardware when more than one is installed on the system.

Convert appropriate printk(KERN_DEBUG) to netdev_print(KERN_DEBUG, ndev).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Convert HW related printk() to dev_printk()
Andy Shevchenko [Tue, 19 Mar 2019 18:49:25 +0000 (20:49 +0200)]
enc28j60: Convert HW related printk() to dev_printk()

The debug prints of hardware status and operations will look better
if SPI device name is printed. The benefit of that is a possibility
to distinguish the actual hardware when more than one is installed
on the system.

Convert appropriate printk(KERN_DEBUG) to dev_print(KERN_DEBUG, &spi->dev).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Switch to dev_<level> from pr_<level>
Andy Shevchenko [Tue, 19 Mar 2019 18:49:24 +0000 (20:49 +0200)]
enc28j60: Switch to dev_<level> from pr_<level>

Instead of using open coded printk(KERN_<LEVEL>) switch the driver to use
dev_<level> macros.

Note, the device name will be printed in full, which is beneficial when
more than one card installed on the system.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Use ether_addr_copy() in enc28j60_set_mac_address()
Andy Shevchenko [Tue, 19 Mar 2019 18:49:23 +0000 (20:49 +0200)]
enc28j60: Use ether_addr_copy() in enc28j60_set_mac_address()

Use ether_addr_copy() instead of memcpy() to copy the mac address.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Switch to use module_spi_driver() macro
Andy Shevchenko [Tue, 19 Mar 2019 18:49:22 +0000 (20:49 +0200)]
enc28j60: Switch to use module_spi_driver() macro

Eliminate some boilerplate code by using module_spi_driver() instead of
->init() / ->exit(), moving the salient bits from ->init() into ->probe().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Drop driver name duplication from messages
Andy Shevchenko [Tue, 19 Mar 2019 18:49:21 +0000 (20:49 +0200)]
enc28j60: Drop driver name duplication from messages

When dev_<level>() macros are used against SPI device, the driver's name
is printed as well. No need to duplicate this explicitly.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Replace dev_*(&netdev->dev, ...) with netdev_*()
Andy Shevchenko [Tue, 19 Mar 2019 18:49:20 +0000 (20:49 +0200)]
enc28j60: Replace dev_*(&netdev->dev, ...) with netdev_*()

Replace open coded netdev_<level>() macros.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Remove duplicate messaging
Andy Shevchenko [Tue, 19 Mar 2019 18:49:19 +0000 (20:49 +0200)]
enc28j60: Remove duplicate messaging

The ->probe() and ->remove() stages can be easily debugged with
initcall_debug or function tracer. There is no need to repeat the same
explicitly in the driver.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenc28j60: Use device_get_mac_address()
Andy Shevchenko [Tue, 19 Mar 2019 18:49:18 +0000 (20:49 +0200)]
enc28j60: Use device_get_mac_address()

Replace the DT-specific of_get_mac_address() function with
device_get_mac_address, which works on both DT and ACPI platforms.
This change makes it easier to add ACPI support.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>