openwrt/staging/blogic.git
9 years agoMerge branch 'robust_listener'
David S. Miller [Fri, 16 Oct 2015 07:52:27 +0000 (00:52 -0700)]
Merge branch 'robust_listener'

Eric Dumazet says:

====================
tcp/dccp: make our listener code more robust

This patch series addresses request sockets leaks and listener dismantle
phase. This survives a stress test with listeners being added/removed
quite randomly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp/dccp: fix race at listener dismantle phase
Eric Dumazet [Wed, 14 Oct 2015 18:16:28 +0000 (11:16 -0700)]
tcp/dccp: fix race at listener dismantle phase

Under stress, a close() on a listener can trigger the
WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop()

We need to test if listener is still active before queueing
a child in inet_csk_reqsk_queue_add()

Create a common inet_child_forget() helper, and use it
from inet_csk_reqsk_queue_add() and inet_csk_listen_stop()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper
Eric Dumazet [Wed, 14 Oct 2015 18:16:27 +0000 (11:16 -0700)]
tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper

Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRevert "inet: fix double request socket freeing"
Eric Dumazet [Wed, 14 Oct 2015 18:16:26 +0000 (11:16 -0700)]
Revert "inet: fix double request socket freeing"

This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14.

At the time of above commit, tcp_req_err() and dccp_req_err()
were dead code, as SYN_RECV request sockets were not yet in ehash table.

Real bug was fixed later in a different commit.

We need to revert to not leak a refcount on request socket.

inet_csk_reqsk_queue_drop_and_put() will be added
in following commit to make clean inet_csk_reqsk_queue_drop()
does not release the reference owned by caller.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodrivers/net: get rid of unnecessary initializations in .get_drvinfo()
Ivan Vecera [Thu, 15 Oct 2015 19:28:52 +0000 (21:28 +0200)]
drivers/net: get rid of unnecessary initializations in .get_drvinfo()

Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
It's not necessary as these fields is filled in ethtool_get_drvinfo().

v2: removed unused variable
v3: removed another unused variable

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-link-improvements'
David S. Miller [Fri, 16 Oct 2015 06:55:33 +0000 (23:55 -0700)]
Merge branch 'tipc-link-improvements'

Jon Maloy says:

====================
tipc: some link level code improvements

Extensive testing has revealed some weaknesses and non-optimal solutions
in the link level code.

This commit series addresses those issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: update node FSM when peer RESET message is received
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:46 +0000 (14:52 -0400)]
tipc: update node FSM when peer RESET message is received

The change made in the previous commit revealed a small flaw in the way
the node FSM is updated. When the function tipc_node_link_down() is
called for the last link to a node, we should check whether this was
caused by a local reset or by a received RESET message from the peer.
In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
the node FSM, so that it is ready to re-establish contact. If this is
not done, the peer node will sometimes have to go through a second
establish cycle before the link becomes stable.

We fix this in this commit by conditionally issuing the mentioned
event in the function tipc_node_link_down(). We also move LINK_RESET
FSM even away from the link_reset() function and into the caller
function, partially because it is easier to follow the code when state
changes are gathered at a limited number of locations, partially
because there will be cases in future commits where we don't want the
link to go RESET mode when link_reset() is called.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: send out RESET immediately when link goes down
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:45 +0000 (14:52 -0400)]
tipc: send out RESET immediately when link goes down

When a link is taken down because of a node local event, such as
disabling of a bearer or an interface, we currently leave it to the
peer node to discover the broken communication. The default time for
such failure discovery is 1.5-2 seconds.

If we instead allow the terminating link endpoint to send out a RESET
message at the moment it is reset, we can achieve the impression that
both endpoints are going down instantly. Since this is a very common
scenario, we find it worthwhile to make this small modification.

Apart from letting the link produce the said message, we also have to
ensure that the interface is able to transmit it before TIPC is
detached. We do this by performing the disabling of a bearer in three
steps:

1) Disable reception of TIPC packets from the interface in question.
2) Take down the links, while allowing them so send out a RESET message.
3) Disable transmission of TIPC packets on the interface.

Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
instead of as currently the NEDEV_DOWN event, to ensure that such
transmission is possible during the teardown phase.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: delay ESTABLISH state event when link is established
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:44 +0000 (14:52 -0400)]
tipc: delay ESTABLISH state event when link is established

Link establishing, just like link teardown, is a non-atomic action, in
the sense that discovering that conditions are right to establish a link,
and the actual adding of the link to one of the node's send slots is done
in two different lock contexts. The link FSM is designed to help bridging
the gap between the two contexts in a safe manner.

We have now discovered a weakness in the implementaton of this FSM.
Because we directly let the link go from state LINK_ESTABLISHING to
state LINK_ESTABLISHED already in the first lock context, we are unable
to distinguish between a fully established link, i.e., a link that has
been added to its slot, and a link that has not yet reached the second
lock context. It may hence happen that a manual intervention, e.g., when
disabling an interface, causes the function tipc_node_link_down() to try
removing the link from the node slots, decrementing its active link
counter etc, although the link was never added there in the first place.

We solve this by delaying the actual state change until we reach the
second lock context, inside the function tipc_node_link_up(). This
makes it possible for potentail callers of __tipc_node_link_down() to
know if they should proceed or not, and the problem is solved.

Unforunately, the situation described above also has a second problem.
Since there by necessity is a tipc_node_link_up() call pending once
the node lock has been released, we must defuse that call by setting
the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
us to make a slight modification to the link FSM, which will now look
as follows.

 +------------------------------------+
 |RESET_EVT                           |
 |                                    |
 |                             +--------------+
 |           +-----------------|   SYNCHING   |-----------------+
 |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
 |           |                  A            |                  |
 |           |                  |            |                  |
 |           |                  |            |                  |
 |           |                  |SYNCH_      |SYNCH_            |
 |           |                  |BEGIN_EVT   |END_EVT           |
 |           |                  |            |                  |
 |           V                  |            V                  V
 |    +-------------+          +--------------+          +------------+
 |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
 |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
 |           |        EVT        |    A         RESET_EVT       |
 |           |                   |    |                         |
 |           |  +----------------+    |                         |
 |  RESET_EVT|  |RESET_EVT            |                         |
 |           |  |                     |                         |
 |           |  |                     |ESTABLISH_EVT            |
 |           |  |  +-------------+    |                         |
 |           |  |  | RESET_EVT   |    |                         |
 |           |  |  |             |    |                         |
 |           V  V  V             |    |                         |
 |    +-------------+          +--------------+        RESET_EVT|
 +--->|    RESET    |--------->| ESTABLISHING |<----------------+
      +-------------+ PEER_    +--------------+
       |           A  RESET_EVT       |
       |           |                  |
       |           |                  |
       |FAILOVER_  |FAILOVER_         |FAILOVER_
       |BEGIN_EVT  |END_EVT           |BEGIN_EVT
       |           |                  |
       V           |                  |
      +-------------+                 |
      | FAILINGOVER |<----------------+
      +-------------+

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: disallow packet duplicates in link deferred queue
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:43 +0000 (14:52 -0400)]
tipc: disallow packet duplicates in link deferred queue

After the previous commits, we are guaranteed that no packets
of type LINK_PROTOCOL or with illegal sequence numbers will be
attempted added to the link deferred queue. This makes it possible to
make some simplifications to the sorting algorithm in the function
tipc_skb_queue_sorted().

We also alter the function so that it will drop packets if one with
the same seqeunce number is already present in the queue. This is
necessary because we have identified weird packet sequences, involving
duplicate packets, where a legitimate in-sequence packet may advance to
the head of the queue without being detected and de-queued.

Finally, we make this function outline, since it will now be called only
in exceptional cases.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: improve sequence number checking
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:42 +0000 (14:52 -0400)]
tipc: improve sequence number checking

The sequence number of an incoming packet is currently only checked
for less than, equality to, or bigger than the next expected number,
meaning that the receive window in practice becomes one half sequence
number cycle, or U16_MAX/2. This does not make sense, and may not even
be safe if there are extreme delays in the network. Any packet sent by
the peer during the ongoing cycle must belong inside his current send
window, or should otherwise be dropped if possible.

Since a link endpoint cannot know its peer's current send window, it
has to base this sanity check on a worst-case assumption, i.e., that
the peer is using a maximum sized window of 8191 packets. Using this
assumption, we now add a check that the sequence number is not bigger
than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
done, so that the receive window test is performed before the gap test.
This way, we are guaranteed that no packet with illegal sequence numbers
are ever added to the deferred queue.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: simplify tipc_link_rcv() reception loop
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:41 +0000 (14:52 -0400)]
tipc: simplify tipc_link_rcv() reception loop

Currently, all packets received in tipc_link_rcv() are unconditionally
added to the packet deferred queue, whereafter that queue is walked and
all its buffers evaluated for delivery. This is both non-optimal and
and makes the queue sorting function unnecessary complex.

This commit changes the loop so that an arrived packet is evaluated
first, and added to the deferred queue only when a sequence number gap
is discovered. A non-empty deferred queue is walked until it is empty
or until its head's sequence number doesn't fit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: limit usage of temporary skb list during packet reception
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:40 +0000 (14:52 -0400)]
tipc: limit usage of temporary skb list during packet reception

During packet reception, the function tipc_link_rcv() adds its accepted
packets to a temporary buffer queue, before finally splicing this queue
into the lock protected input queue that will be delivered up to the
socket layer. The purpose is to reduce potential contention on the input
queue lock. However, since the vast majority of packets arrive in
sequence, they will anyway be added one by one to the input queue, and
the use of the temporary queue becomes a sub-optimization.

The only case where this queue makes sense is when unpacking buffers
from a bundle packet; here we want to avoid dozens of small buffers
to be added individually to the lock-protected input queue in a tight
loop.

In this commit, we remove the general usage of the temporary queue,
and keep it only for the packet unbundling case.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlx4: corretly check failed allocation
Insu Yun [Thu, 15 Oct 2015 16:24:09 +0000 (12:24 -0400)]
mlx4: corretly check failed allocation

When allocation fails, mlx4_alloc_cmd_mailbox returns -ENOMEM.
Since there is no case that mlx4_alloc_cmd_mailbox returns NULL,
it needs to be checked by IS_ERR, not IS_ERR_OR_NULL

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: support encapsulated ipv6 TSO
Eric Dumazet [Thu, 15 Oct 2015 16:22:11 +0000 (09:22 -0700)]
bonding: support encapsulated ipv6 TSO

If using a sixtofour device on top of a bonding device,
skb segmentation of TCP traffic is done right before calling
bonding xmit, because bonding only enables TSO for IPv4.

This patch improves single flow performance by about 120 % on my hosts,
because segmentation is deferred right before calling slave xmit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'mlxsw-cleanups'
David S. Miller [Fri, 16 Oct 2015 06:28:03 +0000 (23:28 -0700)]
Merge branch 'mlxsw-cleanups'

Jiri Pirko says:

====================
mlxsw: Driver update, cleanups

This patchset contains various cleanups and improvements in mlxsw driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: cmd: Update CONFIG_PROFILE command documentation
Ido Schimmel [Thu, 15 Oct 2015 15:43:29 +0000 (17:43 +0200)]
mlxsw: cmd: Update CONFIG_PROFILE command documentation

The meaning of certain parameters in the profile passed to the device
during initialization has changed, so update their documentation
accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Add trap group for control packets
Ido Schimmel [Thu, 15 Oct 2015 15:43:28 +0000 (17:43 +0200)]
mlxsw: Add trap group for control packets

Previously, we trapped flooded and control packets using the same trap
group. This can cause flooded packets to overflow the PCI bus and
prevent control packets (e.g. STP, LACP) from getting to the CPU.

Solve this by splitting the RX trap group to RX and control, which allows
us to configure a policer on the first, thereby preventing it from
overflowing the PCI bus.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Simplify traps creation
Ido Schimmel [Thu, 15 Oct 2015 15:43:27 +0000 (17:43 +0200)]
mlxsw: Simplify traps creation

The Host Trap Group Table (HTGT) register configures trap groups, which
are populated with trap IDs using the Host PacKet Trap (HPKT) register.
However, a trap ID can only be present inside one trap group (the last
configured).

Instead of passing both the trap group and ID for the function that
packs HPKT, pass only the trap ID and derive from it the trap group.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Introduce mlxsw_reg_spms_vid_pack helper and use it
Jiri Pirko [Thu, 15 Oct 2015 15:43:26 +0000 (17:43 +0200)]
mlxsw: Introduce mlxsw_reg_spms_vid_pack helper and use it

Introduce separate helper for packing SPMS VIDs, as it can be used for
multiple VIDs and not only for one as previous SPMS pack function
provided.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: reg: Adjust definition of enum mlxsw_reg_sfgc_type
Ido Schimmel [Thu, 15 Oct 2015 15:43:25 +0000 (17:43 +0200)]
mlxsw: reg: Adjust definition of enum mlxsw_reg_sfgc_type

Define max which would be needed later on.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: reg: Remove extra space in SFGC ID define
Jiri Pirko [Thu, 15 Oct 2015 15:43:24 +0000 (17:43 +0200)]
mlxsw: reg: Remove extra space in SFGC ID define

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: reg: Uppercase letters in register IDs
Jiri Pirko [Thu, 15 Oct 2015 15:43:23 +0000 (17:43 +0200)]
mlxsw: reg: Uppercase letters in register IDs

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Use dev_level_ratelimited instead of net_ratelimit & dev_level
Jiri Pirko [Thu, 15 Oct 2015 15:43:22 +0000 (17:43 +0200)]
mlxsw: Use dev_level_ratelimited instead of net_ratelimit & dev_level

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: core: Do not use EMADs in mlxsw_emad_fini
Jiri Pirko [Thu, 15 Oct 2015 15:43:21 +0000 (17:43 +0200)]
mlxsw: core: Do not use EMADs in mlxsw_emad_fini

Be symmetric with mlxsw_emad_init and don't use EMADs in mlxsw_emad_fini
cleanup function. Use command interface instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: pci: Limit number of entries being sent in single MAP_FA cmd
Jiri Pirko [Thu, 15 Oct 2015 15:43:20 +0000 (17:43 +0200)]
mlxsw: pci: Limit number of entries being sent in single MAP_FA cmd

Firmware accepts only limited number of mapping entries for MAP_FA
command. In order to prevent overflow, introduce a limit and in case the
number of entries is bigger, call MAP_FA multiple times.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: pci: Remove MLXSW_PCI_RDQS/SDQS defines and checks
Jiri Pirko [Thu, 15 Oct 2015 15:43:19 +0000 (17:43 +0200)]
mlxsw: pci: Remove MLXSW_PCI_RDQS/SDQS defines and checks

Remove strict number check of queues count as various ASICs have
different counts.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: pci: Do not use MLXSW_PCI_SDQS_COUNT define
Jiri Pirko [Thu, 15 Oct 2015 15:43:18 +0000 (17:43 +0200)]
mlxsw: pci: Do not use MLXSW_PCI_SDQS_COUNT define

Use mlxsw_pci_sdq_count helper instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: pci: Use MLXSW_PCI_CQS_MAX instead of MLXSW_PCI_CQS_COUNT
Jiri Pirko [Thu, 15 Oct 2015 15:43:17 +0000 (17:43 +0200)]
mlxsw: pci: Use MLXSW_PCI_CQS_MAX instead of MLXSW_PCI_CQS_COUNT

The count of CQs can be different for various ASICs, so just define
maximal value and check for that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: switchx2: Use ETH_ALEN for mac address length
Jiri Pirko [Thu, 15 Oct 2015 15:43:16 +0000 (17:43 +0200)]
mlxsw: switchx2: Use ETH_ALEN for mac address length

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlxsw: Remove multicast ID configuration
Ido Schimmel [Thu, 15 Oct 2015 15:43:15 +0000 (17:43 +0200)]
mlxsw: Remove multicast ID configuration

With respect to a firmware change, the Switch Multicast ID (SMID)
register is no longer needed, so the related configuration code can be
removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoamd-xgbe: Use system workqueue for device restart
Lendacky, Thomas [Wed, 14 Oct 2015 17:37:32 +0000 (12:37 -0500)]
amd-xgbe: Use system workqueue for device restart

A previous patch switched from using the system workqueue to the device
workqueue for various operations. During a device restart the device
workqueue is flushed so the restart cannot use this workqueue or else
a deadlock results.  Move the device restart back to using the system
workqueue.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'switchdev-locking'
David S. Miller [Thu, 15 Oct 2015 13:09:58 +0000 (06:09 -0700)]
Merge branch 'switchdev-locking'

Jiri Pirko says:

====================
switchdev: change locking

This is something which I'm currently struggling with.
Callers of attr_set and obj_add/del often hold not only RTNL, but also
spinlock (bridge). So in that case, the driver implementing the op cannot sleep.

The way rocker is dealing with this now is just to invoke driver operation
and go out, without any checking or reporting of the operation status.

Since it would be nice to at least put a warning in case the operation fails,
it makes sense to do this in delayed work directly in switchdev core
instead of implementing this in separate drivers. And that is what this patchset
is introducing.

So from now on, the locking of switchdev mod ops is consistent. Caller either
holds rtnl mutex or in case it does not, caller sets defer flag, telling
switchdev core to process the op later, in deferred queue.

Function to force to process switchdev deferred ops can be called by op
caller in appropriate location, for example after it releases
spin lock, to force switchdev core to process pending ops.

v1->v2:
- rebased on current net-next head (including Scott's ageing patchset)
v2->v3:
- fixed comment s/of/or/ typo suggested by Nik
v3->v4:
- the actual patchset is sent instead of different branch I send in v3 :/
v4->v5:
- added patch to "const" attr param
- reworked deferred ops infrastructure (mainly patch number 1 and
  internal users (patch 3 and 5)) - resolves the issue pointed out
  by John
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: assert rtnl mutex when going over lower netdevs
Jiri Pirko [Wed, 14 Oct 2015 17:40:55 +0000 (19:40 +0200)]
switchdev: assert rtnl mutex when going over lower netdevs

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorocker: remove nowait from switchdev callbacks.
Jiri Pirko [Wed, 14 Oct 2015 17:40:54 +0000 (19:40 +0200)]
rocker: remove nowait from switchdev callbacks.

No need to avoid sleeping in switchdev callbacks now, as the switchdev
core allows it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: defer switchdev fdb del call in fdb_del_external_learn
Jiri Pirko [Wed, 14 Oct 2015 17:40:53 +0000 (19:40 +0200)]
bridge: defer switchdev fdb del call in fdb_del_external_learn

Since spinlock is held here, defer the switchdev operation. Also, ensure
that defered switchdev ops are processed before port master device
is unlinked.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: introduce possibility to defer obj_add/del
Jiri Pirko [Wed, 14 Oct 2015 17:40:52 +0000 (19:40 +0200)]
switchdev: introduce possibility to defer obj_add/del

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: remove pointers from switchdev objects
Jiri Pirko [Wed, 14 Oct 2015 17:40:51 +0000 (19:40 +0200)]
switchdev: remove pointers from switchdev objects

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: allow caller to explicitly request attr_set as deferred
Jiri Pirko [Wed, 14 Oct 2015 17:40:50 +0000 (19:40 +0200)]
switchdev: allow caller to explicitly request attr_set as deferred

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Benefit from newly introduced switchdev deferred ops infrastructure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: make struct switchdev_attr parameter const for attr_set calls
Jiri Pirko [Wed, 14 Oct 2015 17:40:49 +0000 (19:40 +0200)]
switchdev: make struct switchdev_attr parameter const for attr_set calls

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoswitchdev: introduce switchdev deferred ops infrastructure
Jiri Pirko [Wed, 14 Oct 2015 17:40:48 +0000 (19:40 +0200)]
switchdev: introduce switchdev deferred ops infrastructure

Introduce infrastructure which will be used internally to defer ops.
Note that the deferred ops are queued up and either are processed by
scheduled work or explicitly by user calling deferred_process function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: hisilicon: fixes a bug when using ethtool -S
lipeng [Thu, 15 Oct 2015 04:40:34 +0000 (12:40 +0800)]
net: hisilicon: fixes a bug when using ethtool -S

this patch fixes a bug in hns driver. when we want to get statistic info
by using ethtool -S, it shows us there are 3 wrong counters info. because
the strings related to the registers are wrong. it needs to modify the
strings which give us wrong info.

Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 15 Oct 2015 12:56:32 +0000 (05:56 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-15

This series contains updates to i40e, i40evf and ixgbe.

Emil changes the ixgbe driver to disable LRO by default in favor or GRO.

Mark provides two changes for ixgbe, first fixes a semaphore issue when
a reset never completes, it is necessary to retake the semaphore before
returning.

Jesse fixes up a missing function header comment variable reference.  Then
enables ethtool priv flags to control flow director at runtime.

Neerav changes several i40e error messages to debug only since the
messages were printing when there was no functional issue and were meant
for debug only.

Catherine changes the i40e driver to make only X722 support 100M SGMII,
since it is the only device to actually support it.

Anjali modifies the i40e/i40evf driver to add writeback on ITR offload
support for X722 since the device has a way to work around the
descriptor writeback issue.

Mitch cleans up obsolete code.  Also reduces the i40evf init time by
shortening up the delays in the init task to aid in performance in
load/unload tests and mitigates DMAR errors in VF enable/disable tests.

Shannon modifies i40e to allow flow director sideband when the device
is in MFP mode and only has one partition enabled, since we still have
plenty of interrupts for managing the flow director activity.  Also
cleaned up flow director ATR control in debugfs since the priv flag
has been added to our ethtool interface.  Makes several general code
cleanups of redundant or unnecessary code for i40e.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoixgbe: Check for setup_internal_link method
Mark Rustad [Wed, 9 Sep 2015 20:37:33 +0000 (13:37 -0700)]
ixgbe: Check for setup_internal_link method

Only call the internal_setup_link method when it is provided. This
check is required for newer version parts.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Bump i40e version to 1.3.28 and i40evf to 1.3.19
Catherine Sullivan [Thu, 3 Sep 2015 21:19:02 +0000 (17:19 -0400)]
i40e/i40evf: Bump i40e version to 1.3.28 and i40evf to 1.3.19

Bump.

Change-ID: I8d9a99f320af43960deba8718eee2d6de50eaf46
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: speed up init
Mitch Williams [Thu, 3 Sep 2015 21:19:01 +0000 (17:19 -0400)]
i40evf: speed up init

Shorten up the delays in the init task, allowing the VF driver to
initialize faster. This aids performance in load/unload tests and
mitigates DMAR errors in VF enable/disable tests with absurdly short
delays. In the real world, the VF driver will come up more quickly.

The original values were set conservatively based on what we expected
from the firmware in terms of performance. Now that the driver is in use
and we know how well firmware responds to our requests, we can shorten
these delays.

Change-ID: Ibead77d34b19e8170e667c3f58bc14748bbc5bc9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: remove unnecessary string copy operations
Shannon Nelson [Thu, 3 Sep 2015 21:19:00 +0000 (17:19 -0400)]
i40e: remove unnecessary string copy operations

Save a little stack space and remove unnecessary strncpy() with a little
string pointer.

Change-ID: Id2719d34710bfc273d3bb445fec085cd04276e88
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: X722 is on the IOSF bus and does not report the PCI bus info
Anjali Singhai Jain [Thu, 3 Sep 2015 21:18:59 +0000 (17:18 -0400)]
i40e: X722 is on the IOSF bus and does not report the PCI bus info

X722 will report Gen 1x1 in the PCI config space as it is on
IOSF bus, so skip the PCI bus link/speed check.

Change-ID: Icd5f5751dc7fb00dccf0d5dc5a0a644948e7062e
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Store off PHY capabilities
Kevin Scott [Thu, 3 Sep 2015 21:18:58 +0000 (17:18 -0400)]
i40e: Store off PHY capabilities

Store off reported PHY capabilities in link_info structure.

Change-ID: Ife0f037c26983ca985dbf79abf33f8f8791369e8
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: remove redundant declarations of a variable and a function
Shannon Nelson [Thu, 3 Sep 2015 21:18:57 +0000 (17:18 -0400)]
i40e/i40evf: remove redundant declarations of a variable and a function

Remove a variable declaration inside an if block hiding an existing
declaration at the start of the function.

Also remove a forward function declaration that is no longer needed due
to code re-organization.

Change-ID: I12954668b722718074949c93d74cd20eaacd93e4
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: remove FD atr control from debugfs
Shannon Nelson [Thu, 3 Sep 2015 21:18:56 +0000 (17:18 -0400)]
i40e: remove FD atr control from debugfs

Since the flow-director-atr priv flag was added to our ethtool interface,
we don't need the on/off control in debugfs.

Change-ID: Ib3b599916434ab30ccd40074e71d7a81609b5bb5
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: allow FD SB if MFP mode only has 1 partition
Shannon Nelson [Thu, 3 Sep 2015 21:18:55 +0000 (17:18 -0400)]
i40e: allow FD SB if MFP mode only has 1 partition

Even though the device might be in MFP mode, if there's only one partition
enabled, then we still have plenty of interrupts for managing the Flow
Directory Sideband activity.  This patch enables FD SB in this case.
This patch also reverses the sense of the conditional in order to remove
the negative logic.

Change-ID: I9edf211a6219fc8d159b4be9964f9fd7f4e00bc0
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: remove obsolete version check
Mitch Williams [Thu, 3 Sep 2015 21:18:54 +0000 (17:18 -0400)]
i40e: remove obsolete version check

This version check only applies to very, very old firmware,
that only ran on A0 hardware, which we never shipped and don't
support in this driver anyway. Remove it, before somebody
gets hurt.

Change-ID: I3752d090ff488acf98ee76b075af961e9c968ee4
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Add WB_ON_ITR offload support
Anjali Singhai Jain [Thu, 3 Sep 2015 21:18:53 +0000 (17:18 -0400)]
i40e/i40evf: Add WB_ON_ITR offload support

X722 has a way to work around the descriptor WB issue,
this offload helps turn that feature on.

Change-ID: I7ffa67622426bfca5a651417b63e3afcfeb60412
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Remove 100M SGMII unless hw is X722
Catherine Sullivan [Thu, 3 Sep 2015 21:18:52 +0000 (17:18 -0400)]
i40e: Remove 100M SGMII unless hw is X722

Only the X722 device now supports 100M SGMII, and nothing supports
100M on 1000Base_T.

Change-ID: I6f44dcd818944edd40041410e6de380f4a359a0c
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Change some messages from info to debug only
Neerav Parikh [Thu, 3 Sep 2015 21:18:50 +0000 (17:18 -0400)]
i40e: Change some messages from info to debug only

There are several error messages that have been printing when there is
no functional issue. These messages should be available at debug message
level only.

Change-ID: Id91e47bf942c483563995f30d8705fa53acd5aa3
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: use priv flags to control flow director
Jesse Brandeburg [Thu, 3 Sep 2015 21:18:49 +0000 (17:18 -0400)]
i40e: use priv flags to control flow director

Some customers wish to be able to control our hardware specific
feature called flow director, at runtime.  This patch enables
ethtool priv flags to control this driver/hardware specific feature.

ethtool --set-priv-flags ethX flow-director-atr off

NOTE: the ethtool ntuple interface controls the flow-director
      sideband rules.

Change-ID: Iba156350b07fa2ce66f53ded51739f9a3781fe0e
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Add missing parameter comment to ndo_bridge_setlink
Jesse Brandeburg [Thu, 3 Sep 2015 21:18:48 +0000 (17:18 -0400)]
i40e: Add missing parameter comment to ndo_bridge_setlink

Add nlflags to the function comment for ndo_bridge_setlink.

Change-ID: I34c704f307f2a3f7bac3ca4b44e2a094d3d082d6
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Fix CS4227-related semaphore error on reset failure
Mark Rustad [Wed, 26 Aug 2015 21:10:22 +0000 (14:10 -0700)]
ixgbe: Fix CS4227-related semaphore error on reset failure

If the reset never completes, it is necessary to retake the
semaphore before returning, because the caller will release
the semaphore.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: disable LRO by default
Emil Tantilov [Tue, 25 Aug 2015 01:08:31 +0000 (18:08 -0700)]
ixgbe: disable LRO by default

This patch disables LRO by default in favor of GRO.

LRO is incompatible with forwarding and is disabled when forwarding
is turned on which makes the default offloads of the driver
inconsistent. LRO can still be enabled via ethtool.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Darin Miller <darin.j.miller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoMerge branch 'mlx-next'
David S. Miller [Thu, 15 Oct 2015 02:14:50 +0000 (19:14 -0700)]
Merge branch 'mlx-next'

Or Gerlitz says:

====================
Mellanox driver update, Oct 14 2015

This series contains two more patches from Eli, patch from Majd
to support PCI error handlers and a fix from Jack to mlx4 VFs
when probed without a provisioned mac address.

The patch set applied on top of net-next commit bbb300e "Merge branch 'bridge-vlan'"

changes from V0:
  - made the health flag int --> bool to address comment from Dave on patch #1
  - fixed sparse warning noted by the 0-day build tests in patch #2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4_core: Replace VF zero mac with random mac in mlx4_core
Jack Morgenstein [Wed, 14 Oct 2015 14:43:48 +0000 (17:43 +0300)]
net/mlx4_core: Replace VF zero mac with random mac in mlx4_core

By design, when no default MAC addresses are set in the Hypervisor for VFs,
the VFs are passed zero-macs. When such a MAC is received by the VF, it
generates a random MAC address and registers that MAC address
with the Hypervisor.

This random mac generation is currently done in the mlx4_en module.
There is a problem, though, if the mlx4_ib module is loaded by a VF before
the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced
zero-mac and register that zero-mac as part of QP1 initialization.

Having a zero-mac in the port's MAC table creates problems for a
Baseboard Management Console. The BMC occasionally sends packets with a
zero-mac destination MAC. If there is a zero-mac present in the port's
MAC table, the FW will send such BMC packets to the host driver rather than
to the wire, and BMC will stop working.

To address this problem, we move the replacement of zero-mac addresses
with random-mac addresses to procedure mlx4_slave_cap(), which is part of the
driver startup for VFs, and is before activation of mlx4_ib and mlx4_en.
As a result, zero-mac addresses will never be registered in the port MAC table
by the driver.

In addition, when mlx4_en does initialize the net device, it needs to set
the NET_ADDR_RANDOM flag in the netdev structure if the address was
randomly generated. This is done so that udev on the VM does not create
a new device name after each VF probe (VM boot and such). To accomplish this,
we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces
a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en
initializes the net-device.

Fix was suggested by Matan Barak <matanb@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5_core: Wait for FW readiness on startup
Eli Cohen [Wed, 14 Oct 2015 14:43:47 +0000 (17:43 +0300)]
net/mlx5_core: Wait for FW readiness on startup

On device initialization, wait till firmware indicates that that it is done
with initialization before proceeding to initialize the device.

Also update initialization segment layout to match driver/firmware
interface definitions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5_core: Add pci error handlers to mlx5_core driver
Majd Dibbiny [Wed, 14 Oct 2015 14:43:46 +0000 (17:43 +0300)]
net/mlx5_core: Add pci error handlers to mlx5_core driver

This patch implement the pci_error_handlers for mlx5_core which allow the
driver to recover from PCI error.

Once an error is detected in the PCI, the mlx5_pci_err_detected is called
and it:
1) Marks the device to be in 'Internal Error' state.
2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes
with error.
3) Returns all the on going commands with error.
4) Unloads the driver.

Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it
enables the device and restore it's pci state.

If the later succeeds, mlx5_pci_resume is called, and it loads the SW
stack.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx5_core: Fix internal error detection conditions
Eli Cohen [Wed, 14 Oct 2015 14:43:45 +0000 (17:43 +0300)]
net/mlx5_core: Fix internal error detection conditions

The detection of a fatal condition has been updated to take into account
the state reported by the device or by detecting an all ones read of the
firmware version which indicates that the device is not accessible.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: avoid spurious SYN flood detection at listen() time
Eric Dumazet [Wed, 14 Oct 2015 13:16:49 +0000 (06:16 -0700)]
tcp: avoid spurious SYN flood detection at listen() time

At listen() time, there is a small window where listener is visible with
a zero backlog, triggering a spurious "Possible SYN flooding on port"
message.

Nothing prevents us from setting the correct backlog.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp/dccp: fix potential NULL deref in __inet_inherit_port()
Eric Dumazet [Wed, 14 Oct 2015 12:58:38 +0000 (05:58 -0700)]
tcp/dccp: fix potential NULL deref in __inet_inherit_port()

As we no longer hold listener lock in fast path, it is possible that a
child is created right after listener freed its bound port, if a close()
is done while incoming packets are processed.

__inet_inherit_port() must detect this and return an error,
so that caller can free the child earlier.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: aquantia/teranetics: Convert to use module_phy_driver macro
Axel Lin [Wed, 14 Oct 2015 10:30:48 +0000 (18:30 +0800)]
net: phy: aquantia/teranetics: Convert to use module_phy_driver macro

Use module_phy_driver macro to simplify the code a bit.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: hisilicon net: fix a bug about led
lipeng [Wed, 14 Oct 2015 02:28:57 +0000 (10:28 +0800)]
net: hisilicon net: fix a bug about led

this patch fixes a bug in hns driver. the link led is on at the beginning,
but at this time the ethernet port is on down status. it needs to reset
the led status on init sequence.

Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4i: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytes
Karen Xie [Wed, 14 Oct 2015 00:13:59 +0000 (17:13 -0700)]
cxgb4i: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytes

This helps improving the latency of small packets.

Signed-off-by: Rakesh Ranjan <rakesh@chelsio.com>
Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'linux-can-next-for-4.4-20151013' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 15 Oct 2015 01:36:58 +0000 (18:36 -0700)]
Merge tag 'linux-can-next-for-4.4-20151013' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-09-17

this is a pull request of 4 patches for net-next/master.

Two patches are by Gerhard Bertelsmann, fixing some problems in the
sun4i driver. The patch by Arnd Bergmann stops using timeval for the
CAN broadcast manager. The last patch by Alexandre Belloni removes the
otherwise unused struct at91_can_data from the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: hisilicon: supports promisc mode
yankejian [Tue, 13 Oct 2015 01:53:45 +0000 (09:53 +0800)]
net: hisilicon: supports promisc mode

this patch adds support to set promisc mode. it configs the queue on
init seq  when it is on promisc mode.and being enabled or disabled promisc
mode by upper level user.

Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRevert "ipv4/icmp: redirect messages can use the ingress daddr as source"
Paolo Abeni [Wed, 14 Oct 2015 12:25:53 +0000 (14:25 +0200)]
Revert "ipv4/icmp: redirect messages can use the ingress daddr as source"

Revert the commit e2ca690b657f ("ipv4/icmp: redirect messages
can use the ingress daddr as source"), which tried to introduce a more
suitable behaviour for ICMP redirect messages generated by VRRP routers.
However RFC 5798 section 8.1.1 states:

    The IPv4 source address of an ICMP redirect should be the address
    that the end-host used when making its next-hop routing decision.

while said commit used the generating packet destination
address, which do not match the above and in most cases leads to
no redirect packets to be generated.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 14 Oct 2015 12:53:48 +0000 (05:53 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-13

This series contains updates to i40e, i40evf, ixgbe and fm10k.

Carolyn cleans up ndo_bridge_getlink() by flagging a parameter as
__always_unused, since it is never used.  Adds a member to the nvm_info
struct to store OEM version info to be output either by OID or ethtool.

Neerav cleans up a remaining bit shift to use BIT() macro.

Mitch fixes the i40evf driver to properly handle calls to its
ndo_set_mac_address() method.  It did not properly check to see if the
override would be allowed by the PF driver, and it never removed the old
address from its filter list.  Cleaned up the use of
i40e_enable_vf_mappings() in i40e_alloc_vfs(), since it is just redundant
since we already call it by i40e_reset_vf().  Fixed a possible panic
in some circumstances where the firmware may fail to allocate a VSI for
a VF by checking the return value from i40e_alloc_vf_res() and don't
try to configure the device further if it failed.

Greg fixes the parsing of CEE App TLVs so the caller does not have to
consider whether the App came from a CEE or IEEE DCBx negotiation.

Shannon moves the device ids into a standalone file due to the desire
to write user-land drivers (and other requests) without needing the rest
of the include files.

Catherine adds the ability to save the module information from
get_phy_capabilities() to be used to determine which speeds the module
supports.  Also cleaned up the PHY structure by removing unused members
and add the ability to store the PHY capabilities reported by the
firmware.

Emil modifies ixgbe to ensure that flow control packets initiated by the
VF are dropped and reported as spoofed.

Jacob cleans up the fm10k driver to avoid buffer overflow by using
sprintf(), so convert to using snprintf().  Also fixed the use of an
enum as a boolean, so check for the actual value of NETREG_UNINITIALIZED
in case it ever changes from the current value of zero.

v2: Dropped patch 11 of the original series, which added functions that
    were never used.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofm10k: do not use enum as boolean
Jacob Keller [Tue, 25 Aug 2015 00:02:00 +0000 (17:02 -0700)]
fm10k: do not use enum as boolean

Check for actual value NETREG_UNINITIALIZED in case it ever changes from
the current value of zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agofm10k: use snprintf() instead of sprintf() to avoid buffer overflow
Jacob Keller [Tue, 25 Aug 2015 00:01:58 +0000 (17:01 -0700)]
fm10k: use snprintf() instead of sprintf() to avoid buffer overflow

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: add flow control ethertype to the anti-spoofing filter
Emil Tantilov [Thu, 20 Aug 2015 22:31:20 +0000 (15:31 -0700)]
ixgbe: add flow control ethertype to the anti-spoofing filter

This patch makes sure that flow control packets initiated by the VF are
dropped and reported as spoofed.

Flow control packets can be used to limit the throughput or as DOS
attack when generated from a VF. Flow control is not supported per VF
hence any pause frames generated from a VF are considered malicious.

Also cleaned up indentation and some redundant comments.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Bump i40e version to 1.3.25 and i40evf to 1.3.17
Catherine Sullivan [Mon, 31 Aug 2015 23:54:55 +0000 (19:54 -0400)]
i40e/i40evf: Bump i40e version to 1.3.25 and i40evf to 1.3.17

Bump.

Change-ID: If3cd42f6c1b9546beed60faf9c79faab35216f58
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Refactor PHY structure and add phy_capabilities enum
Catherine Sullivan [Tue, 1 Sep 2015 15:36:30 +0000 (11:36 -0400)]
i40e/i40evf: Refactor PHY structure and add phy_capabilities enum

Remove unused members in the PHY structure and add a new member to store
all the capabilities the PHY has as reported by the FW. This information
will help us determine what speeds the device is capable of when link is
down.

Also add an enum to decode the PHY types the NVM is capable of.
Use the phy_types variable to determine what phy types are possible
when link is down instead of device id as it will be more accurate.

When on a backplane device, we do not support changing any settings,
however we should display all the phy_types we are capable of so if we
see a backplane dev ID set supported and advertised purely based on
the phy_types variable.

Change-ID: Ia75d560f1fcd30c54cbfb7458690c5867559a930
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Add module_types and update_link_info
Catherine Sullivan [Mon, 31 Aug 2015 23:54:53 +0000 (19:54 -0400)]
i40e/i40evf: Add module_types and update_link_info

Add a module_types variable to the link_info struct to save the module
information from get_phy_capabilities. This information can be used to
determine which speeds the module supports.

Also add a new function update_link_info which updates the module_types
parameter and then calls get_link_info. This function should be called
in place of get_link_info so that the module_types variable stays
up-to-date with the rest of the link information.

The EAS table does not reflect the values that are actually returned,
so instead, basing these values on the Ethernet compliance codes
specified in table 33 of SFF-8436 as these have been accurate.

Use the new variable in ethtool to differentiate between a 10G/1G dual
speed fiber module and a 10G only module.

Change-ID: Ib7585cce321319c10ce15180054c41a6cbd41389
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: split device ids into a separate file
Shannon Nelson [Mon, 31 Aug 2015 23:54:50 +0000 (19:54 -0400)]
i40e/i40evf: split device ids into a separate file

Due to desires to write userland drivers, and other requests, without
needing the rest of the include files, the device ids are pulled out
into a standalone file.

Change-ID: Ic0b047dbf9d4b0891892309c1f2079f56d9b60e8
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: update fw version text string per previous product formats
Carolyn Wyborny [Mon, 31 Aug 2015 23:54:49 +0000 (19:54 -0400)]
i40e: update fw version text string per previous product formats

This patch moves the internal fw version and fw api version info to be
output in probe.  The nvm version, etrack and oem version info are now
configured for output via ethtool -i.

Change-ID: I05d490093a7137dbefcdef263d014d1e5c9e83d0
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: don't panic on VSI allocation failure
Mitch Williams [Mon, 31 Aug 2015 23:54:48 +0000 (19:54 -0400)]
i40e: don't panic on VSI allocation failure

In some circumstances, the firmware may fail to allocate a VSI for a VF.
When this happens, the driver does not react well to the bad news and
has a panic attack.

To fix this problem, check the return value from i40e_alloc_vf_res and
don't try to configure the device further if it failed. Additionally,
explicitly clear the INIT bit when we free VF resources, so that this
bit will be in the proper state in the failure case, and won't blow up
elsewhere.

Change-ID: I6a20ce2b59c3458fd832032e88fa28cd42500189
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: remove redundant call
Mitch Williams [Mon, 31 Aug 2015 23:54:47 +0000 (19:54 -0400)]
i40e: remove redundant call

This function call isn't needed here; the same function is already
called by i40e_reset_vf.

Change-ID: I96ccbf91b752965c9e28fe895d4c7d4c46e3ba44
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Convert CEE App TLV selector to IEEE selector
Greg Bowers [Mon, 31 Aug 2015 23:54:46 +0000 (19:54 -0400)]
i40e: Convert CEE App TLV selector to IEEE selector

Changes the parsing of CEE App TLVs to fill in the App selector in struct
i40e_dcbx_config with the IEEE App selector so the caller doesn't have to
consider whether the App came from a CEE or IEEE DCBX negotiation.

Change-ID: Ia7d9d664cde04d2ebcc9822fd22e4929c6edab3a
Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Add info to nvm info struct for OEM version data
Carolyn Wyborny [Mon, 31 Aug 2015 23:54:45 +0000 (19:54 -0400)]
i40e/i40evf: Add info to nvm info struct for OEM version data

This patch adds a member to the nvm_info struct for oem_ver info to be
output either by OID or ethtool.

Change-ID: I1e5d513ae67622e2af17042924fdb4b5d6d85366
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: properly handle ndo_set_mac_address calls
Mitch Williams [Mon, 31 Aug 2015 23:54:44 +0000 (19:54 -0400)]
i40evf: properly handle ndo_set_mac_address calls

The driver was not correctly handling calls to its ndo_set_mac_address
method. It did not properly check to see if the override would be
allowed by the PF driver, and never removed the old address from its
filter list.

Add a new flag to the adapter struct which is set if the MAC address is
assigned by the PF. Check this flag and don't allow the MAC address to
be changed if it is set. Search for and properly remove the filter
for the old MAC address when the new one is set.

Change-ID: I817bf620c869c5a80e6a7eab65c9cbad1dc89799
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Use BIT() macro for priority map parsing
Neerav Parikh [Mon, 31 Aug 2015 23:54:43 +0000 (19:54 -0400)]
i40e: Use BIT() macro for priority map parsing

Replace one left over (1 << up) in the i40e_dcb.c file with the BIT()
macro.

Change-ID: I39492a400a2cee5ac566143a5b436cc478bea0db
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Make it clear a parameter is never used
Carolyn Wyborny [Mon, 31 Aug 2015 23:54:42 +0000 (19:54 -0400)]
i40e: Make it clear a parameter is never used

Flag the filter_mask parameter as __always_unused in the
ndo_bridge_getlink function.

Change-ID: Ifc1e99c7fb84bcbf81cf7b0ac891ad8ca956ffb2
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Add new link status defines
Shannon Nelson [Mon, 31 Aug 2015 23:54:41 +0000 (19:54 -0400)]
i40e/i40evf: Add new link status defines

Add the new Port link status bit and rename the link status to function
link status.

Change-ID: I71289327ae62638ce967b6ad40114caf998b6dab
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agonet: vrf: Documentation update, ip commands
David Ahern [Mon, 12 Oct 2015 20:54:38 +0000 (13:54 -0700)]
net: vrf: Documentation update, ip commands

Add ip commands with examples for creating VRF devics, enslaving interfaces
and dumping VRF-focused data (address, neighbors, routes).

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomISDN: use kstrdup() in dsp_pipeline_build
Geliang Tang [Mon, 12 Oct 2015 08:19:07 +0000 (01:19 -0700)]
mISDN: use kstrdup() in dsp_pipeline_build

Use kstrdup instead of strlen-kmalloc-strcpy. Remove unneeded NULL
test, it will be tested inside kstrdup. Remove 0 length string test,
it has been tested in the caller of dsp_pipeline_build.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp/dccp: fix behavior of stale SYN_RECV request sockets
Eric Dumazet [Wed, 14 Oct 2015 00:12:54 +0000 (17:12 -0700)]
tcp/dccp: fix behavior of stale SYN_RECV request sockets

When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
become stale, meaning 3WHS can not complete.

But current behavior is wrong :
incoming packets finding such stale sockets are dropped.

We need instead to cleanup the request socket and perform another
lookup :
- Incoming ACK will give a RST answer,
- SYN rtx might find another listener if available.
- We expedite cleanup of request sockets and old listener socket.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocan: at91: remove at91_can_data
Alexandre Belloni [Thu, 8 Oct 2015 14:56:07 +0000 (16:56 +0200)]
can: at91: remove at91_can_data

struct at91_can_data was used to pass a callback to the driver, allowing it
to switch the transceiver on and off. As all at91 boards are now using DT,
this is not used anymore, remove that structure.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
9 years agocan: avoid using timeval for uapi
Arnd Bergmann [Wed, 30 Sep 2015 11:26:42 +0000 (13:26 +0200)]
can: avoid using timeval for uapi

The can subsystem communicates with user space using a bcm_msg_head
header, which contains two timestamps. This is problematic for
multiple reasons:

a) The structure layout is currently incompatible between 64-bit
   user space and 32-bit user space, and cannot work in compat
   mode (other than x32).

b) The timeval structure layout will change in 32-bit user
   space when we fix the y2038 overflow problem by redefining
   time_t to 64-bit, making new 32-bit user space incompatible
   with the current kernel interface.
   Cars last a long time and often use old kernels, so the actual
   users of this code are the most likely ones to migrate to y2038
   safe user space.

This tries to work around part of the problem by changing the
publicly visible user interface in the header, but not the binary
interface. Fortunately, the values passed around in the structure
are relative times and do not actually suffer from the y2038
overflow, so 32-bit is enough here.

We replace the use of 'struct timeval' with a newly defined
'struct bcm_timeval' that uses the exact same binary layout
as before and that still suffers from problem a) but not problem
b).

The downside of this approach is that any user space program
that currently assigns a timeval structure to these members
rather than writing the tv_sec/tv_usec portions individually
will suffer a compile-time error when built with an updated
kernel header. Fixing this error makes it work fine with old
and new headers though.

We could address problem a) by using '__u32' or 'int' members
rather than 'long', but that would have a more significant
downside in also breaking support for all existing 64-bit user
binaries that might be using this interface, which is likely
not acceptable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
9 years agocan: sun4i: fix MODULE_DESCRIPTION
Gerhard Bertelsmann [Fri, 25 Sep 2015 16:58:39 +0000 (18:58 +0200)]
can: sun4i: fix MODULE_DESCRIPTION

This patch change description of the module.

Signed-off-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
9 years agocan: sun4i: fix arbitration lost error reporting
Gerhard Bertelsmann [Fri, 25 Sep 2015 16:58:38 +0000 (18:58 +0200)]
can: sun4i: fix arbitration lost error reporting

This patch fixes a bug in arbitration error reporting

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
9 years agoMerge branch 'bridge-vlan'
David S. Miller [Tue, 13 Oct 2015 11:58:04 +0000 (04:58 -0700)]
Merge branch 'bridge-vlan'

Nikolay Aleksandrov says:

====================
bridge: vlan: cleanups & fixes (part 3)

Patch 01 converts the vlgrp member to use rcu as it was already used in a
similar way so better to make it official and use all the available RCU
instrumentation. Patch 02 fixes a bug where the vlan_list can be traversed
without rtnl or rcu held which could lead to using freed entries.
Patch 03 removes some redundant code that isn't needed anymore.
Patch 04 fixes a bug reported by Ido Schimmel about the vlan_flush order
and switchdevs, it moves it back.

v2: patch 03 and 04 are new, couldn't escape the second synchronize_rcu()
since the rhtable destruction can sleep
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: vlan: move back vlan_flush
Nikolay Aleksandrov [Mon, 12 Oct 2015 19:47:05 +0000 (21:47 +0200)]
bridge: vlan: move back vlan_flush

Ido Schimmel reported a problem with switchdev devices because of the
order change of del_nbp operations, more specifically the move of
nbp_vlan_flush() which deletes all vlans and frees vlgrp after the
rx_handler has been unregistered. So in order to fix this move
vlan_flush back where it was and make it destroy the rhtable after
NULLing vlgrp and waiting a grace period to make sure noone can see it.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: vlan: drop unnecessary flush code
Nikolay Aleksandrov [Mon, 12 Oct 2015 19:47:04 +0000 (21:47 +0200)]
bridge: vlan: drop unnecessary flush code

As Ido Schimmel pointed out the vlan_vid_del() code in nbp_vlan_flush is
unnecessary (and is actually a remnant of the old vlan code) so we can
remove it.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>