Tom Herbert [Tue, 7 Jun 2016 23:09:44 +0000 (16:09 -0700)]
ila: Perform only one translation in forwarding path
When setting up ILA in a router we noticed that the the encapsulation
is invoked twice: once in the route input path and again upon route
output. To resolve this we add a flag set_csum_neutral for the
ila_update_ipv6_locator. If this flag is set and the checksum
neutral bit is also set we assume that checksum-neutral translation
has already been performed and take no further action. The
flag is set only in ila_output path. The flag is not set for ila_input and
ila_xlat.
Tested:
Used 3 netns to set to emulate a router and two hosts. The router
translates SIR addresses between the two destinations in other two netns.
Verified ping and netperf are functional.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pau Espin Pedrol [Tue, 7 Jun 2016 14:30:34 +0000 (16:30 +0200)]
tcp: accept RST if SEQ matches right edge of right-most SACK block
RFC 5961 advises to only accept RST packets containing a seq number
matching the next expected seq number instead of the whole receive
window in order to avoid spoofing attacks.
However, this situation is not optimal in the case SACK is in use at the
time the RST is sent. I recently run into a scenario in which packet
losses were high while uploading data to a server, and userspace was
willing to frequently terminate connections by sending a RST. In
this case, the ACK sent on the receiver side (rcv_nxt) is frozen waiting
for a lost packet retransmission and SACK blocks are used to let the
client continue uploading data. At some point later on, the client sends
the RST (snd_nxt), which matches the next expected seq number of the
right-most SACK block on the receiver side which is going forward
receiving data.
In this scenario, as RFC 5961 defines, the RST SEQ doesn't match the
frozen main ACK at receiver side and thus gets dropped and a challenge
ACK is sent, which gets usually lost due to network conditions. The main
consequence is that the connection stays alive for a while even if it
made sense to accept the RST. This can get really bad if lots of
connections like this one are created in few seconds, allocating all the
resources of the server easily.
For security reasons, not all SACK blocks are checked (there could be a
big amount of SACK blocks => acceptable SEQ numbers). Furthermore, it
wouldn't make sense to check for RST in blocks other than the right-most
received one because the sender is not expected to be sending new data
after the RST. For simplicity, only up to the 4 most recently updated
SACK blocks (selective_acks[4] field) are compared to find the
right-most block, as usually those are the ones with bigger probability
to contain it.
This patch was tested in a 3.18 kernel and probed to improve the
situation in the scenario described above.
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 7 Jun 2016 12:04:16 +0000 (15:04 +0300)]
qed: potential overflow in qed_cxt_src_t2_alloc()
In the current code "ent_per_page" could be more than "conn_num" making
"conn_num" negative after the subtraction. In the next iteration
through the loop then the negative is treated as a very high positive
meaning we don't put a limit on "ent_num". It could lead to memory
corruption.
Fixes: dbb799c39717 ('qed: Initialize hardware for new protocols')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Jun 2016 07:25:38 +0000 (00:25 -0700)]
Merge branch 'vrf-local'
David Ahern says:
====================
net: vrf: Add support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:
$ ip addr show dev red
33: red: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1/32 scope global red
valid_lft forever preferred_lft forever
inet6 1111:1::1/128 scope global
valid_lft forever preferred_lft forever
$ ip addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms
$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms
--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms
$ ping6 -c1 -I red 1111::1
PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms
--- 1111::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms
This change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8
$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 7 Jun 2016 03:50:40 +0000 (20:50 -0700)]
net: vrf: ipv6 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:
$ ip addr show dev eth1
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever
$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms
ip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 7 Jun 2016 03:50:39 +0000 (20:50 -0700)]
net: vrf: ipv4 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.
With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:
$ ip addr show dev eth1
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever
$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
This patch also enables use of IPv4 loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8
$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 7 Jun 2016 03:50:38 +0000 (20:50 -0700)]
net: vrf: Minor refactoring for local address patches
Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions and collapse vrf_send_v4_prep into
vrf_process_v4_outbound.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 6 Jun 2016 23:06:02 +0000 (16:06 -0700)]
gue: Implement direction IP encapsulation
This patch implements direct encapsulation of IPv4 and IPv6 packets
in UDP. This is done a version "1" of GUE and as explained in I-D
draft-ietf-nvo3-gue-03.
Changes here are only in the receive path, fou with IPxIPx already
supports the transmit side. Both the normal receive path and
GRO path are modified to check for GUE version and check for
IP version in the case that GUE version is "1".
Tested:
IPIP with direct GUE encap
1 TCP_STREAM
4530 Mbps
200 TCP_RR
1297625 tps
135/232/444 90/95/99% latencies
IP4IP6 with direct GUE encap
1 TCP_STREAM
4903 Mbps
200 TCP_RR
1184481 tps
149/253/473 90/95/99% latencies
IP6IP6 direct GUE encap
1 TCP_STREAM
5146 Mbps
200 TCP_RR
1202879 tps
146/251/472 90/95/99% latencies
SIT with direct GUE encap
1 TCP_STREAM
6111 Mbps
200 TCP_RR
1250337 tps
139/241/467 90/95/99% latencies
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jun 2016 23:37:14 +0000 (16:37 -0700)]
Merge branch 'net-sched-fast-stats'
Eric Dumazet says:
====================
net: sched: faster stats gathering
A while back, I sent one RFC patch using lockless stats gathering
on 64bit arches.
This patch series does it more cleanly, using a seqcount.
Since qdisc/class stats are written at dequeue() time,
we can ask the dequeue to change the seqcount, so that
stats readers can avoid taking the root qdisc lock,
and instead the typical read_seqcount_{begin|retry} guarded
loop.
This does not change fast path costs, as the seqcount
increments are not more expensive than the bit manipulation,
and allows readers to not freeze the fast path anymore.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jun 2016 16:37:16 +0000 (09:37 -0700)]
net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :
For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()
In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.
[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jun 2016 16:37:15 +0000 (09:37 -0700)]
net_sched: transform qdisc running bit into a seqcount
Instead of using a single bit (__QDISC___STATE_RUNNING)
in sch->__state, use a seqcount.
This adds lockdep support, but more importantly it will allow us
to sample qdisc/class statistics without having to grab qdisc root lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jun 2016 23:18:20 +0000 (16:18 -0700)]
Merge branch 'be2net-noncrit-fixes'
Sathya Perla says:
====================
be2net: patch set
Hi David, the following patch set contains three non-critical fixes that
can go into the net-next tree.
Patch 1 fixes the logic for provisioning queue pairs on VFs to take into
account the limit on number of TXQs too as in some profiles the number
of TXQs is less than that of RXQs.
Patch 2 enables WoL support from shutdown on Skyhawk.
Patch 3 enhances the logic for provisioning queue pairs on VFs on
SR-IOV over multi-partition configs. Each PF (partition) on a port has to
compute the number of RSS tables it's VFs can use.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Mon, 6 Jun 2016 11:22:10 +0000 (07:22 -0400)]
be2net: Fix provisioning of RSS for VFs in multi-partition configurations
Currently, we do not distribute queue resources to enable RSS for VFs
in multi-channel/partition configurations.
Fix this by having each PF(SRIOV capable) calculate it's share of the
15 RSS Policy Tables available per port before provisioning resources for
all the VFs.
This proportional share calculation is done based on division of the
PF's MAX VFs with the Total MAX VFs on that port. It also needs to
learn about the no: of NIC PFs on the port and subtract that from
the 15 RSS Policy Tables on the port.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sriharsha Basavapatna [Mon, 6 Jun 2016 11:22:09 +0000 (07:22 -0400)]
be2net: Enable Wake-On-LAN from shutdown for Skyhawk
Skyhawk does support wake-up from ACPI shutdown state - S5, provided the
platform supports it (like Auxiliary power source etc). The changes listed
below are done to fix this.
1) There's no need to defer the HW configuration of WOL to be_suspend().
Remove this in be_suspend() and move it to be_set_wol() ethtool function
so it is configured directly in the context of ethtool. This automatically
takes care of the shutdown case.
2) The driver incorrectly uses WOL_CAP field in the FW response to
get_acpi_wol_cap() command, to determine if WOL is enabled. Instead the
driver must rely on the macaddr field in the response to infer WOL state.
3) In be_get_config() during init, if we find that WOL is enabled in FW,
call pci_enable_wake() to enable pmcsr.pme_en bit. This is needed to
support persistent WOL configuration provided by the FW in some platforms.
4) Remove code in be_set_wol() that writes to PCICFG_PM_CONTROL_OFFSET
to set pme_en bit; pci_enable_wake() sets that.
Fixes: 028991e49 ("Enabling Wake-on-LAN is not supported in S5 state")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Mon, 6 Jun 2016 11:22:08 +0000 (07:22 -0400)]
be2net: use max-TXQs limit too while provisioning VF queue pairs
When the PF driver provisions resources for VFs, it currently only looks
at max RSS queues available to calculate the number of VF queue pairs.
This logic breaks when there are less number of TX-queues than RSS-queues.
This patch fixes this problem by using the max-TXQs available in the
PF-pool in the calculations. As a part of this change the
be_calculate_vf_qs() routine is renamed as be_calculate_vf_res() and the
code that calculates limits on other related resources is moved here to
contain all resource calculation code inside one routine.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Mon, 6 Jun 2016 06:30:02 +0000 (14:30 +0800)]
drivers/net: support hdlc function for QE-UCC
The driver add hdlc support for Freescale QUICC Engine.
It support NMSI and TSA mode.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Mon, 6 Jun 2016 06:30:01 +0000 (14:30 +0800)]
fsl/qe: Add QE TDM lib
QE has module to support TDM, some other protocols
supported by QE are based on TDM.
add a qe-tdm lib, this lib provides functions to the protocols
using TDM to configurate QE-TDM.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Mon, 6 Jun 2016 06:30:00 +0000 (14:30 +0800)]
fsl/qe: Make regs resouce_size_t
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Mon, 6 Jun 2016 06:29:59 +0000 (14:29 +0800)]
fsl/qe: setup clock source for TDM mode
Add tdm clock configuration in both qe clock system and ucc
fast controller.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Mon, 6 Jun 2016 06:29:58 +0000 (14:29 +0800)]
fsl/qe: add rx_sync and tx_sync for TDM mode
Rx_sync and tx_sync are used by QE-TDM mode,
add them to struct ucc_fast_info.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Sun, 5 Jun 2016 14:41:32 +0000 (10:41 -0400)]
net sched: indentation and other OCD stylistic fixes
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
David S. Miller [Tue, 7 Jun 2016 22:53:44 +0000 (15:53 -0700)]
Merge branch 'sch-action-tstamp'
Jamal Hadi Salim says:
====================
net sched action timestamp improvements
Various aggregations of duplicated code, fixes and introduction of firstused
timestamp
v2: add const for source time info per suggestion from Cong
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 6 Jun 2016 10:32:55 +0000 (06:32 -0400)]
net sched actions: aggregate dumping of actions timeinfo
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 6 Jun 2016 10:32:54 +0000 (06:32 -0400)]
net sched actions: introduce timestamp for firsttime use
Useful to know when the action was first used for accounting
(and debugging)
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal Hadi Salim [Mon, 6 Jun 2016 10:32:53 +0000 (06:32 -0400)]
net sched: actions use tcf_lastuse_update for consistency
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Sun, 5 Jun 2016 14:11:18 +0000 (17:11 +0300)]
net/sched: cls_flower: Introduce support in SKIP SW flag
In order to make a filter processed only by hardware, skip_sw flag
should be supplied. This is an addition to the already existing skip_hw
flag (filter will be processed by software only). If no flag is
specified, filter will be processed by both software and hardware.
If only hardware offloaded filters exist, fl_classify() will return
without doing anything.
A following userspace patch will be sent once kernel patch is accepted.
Example:
tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \
flower \
ip_proto 6 \
indev enp0s9 \
skip_sw \
action skbedit mark 0x1234
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Jun 2016 22:40:12 +0000 (15:40 -0700)]
Merge branch 'qed-iov-fw-reqs'
Yuval Mintz says:
====================
qed: IOV series - relax firmware requirements
In order for VFs to work, current implementation demands that the VF's
requried storm firmware would be exactly the version that was loaded by
the PF, which is a very harsh requirement.
This patch series is intended to relax this -
the recently submitted firmware is intended to be forward/backward
compatible in its fastpath [slowpath is configured by PF on behalf of VF],
and so VFs would only be required of having the same major faspath HSI in
order to work.
Most of the other patches in this series extend current forward
compatibilty of driver to reduce chance of breaking PF/VF compatibility
in the future. A few are unrelated IOV changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:16 +0000 (13:11 +0300)]
qed: PF to reply to unknown messages
If a future VF would send the PF an unknown message, the PF today would
not send a reply. This would have 2 bad effects:
a. VF would have to timeout on the request.
b. If VF were to send an additional message to PF, firmware would mark
it as malicious.
Instead, if there's some valid reply-address on the message - let the PF
answer and tell the VF it doesn't know the message.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:15 +0000 (13:11 +0300)]
qed: PF enforce MAC limitation of VFs
The only limitation relating to MACs the PF enforce today on its VFs
is in case it has a forced-unicast MAC address for them, in which case
they can't configure other unicast addresses.
Specifically, the PF isn't enforcing the number of MAC addresse a VF can
configure regardless of the nubmer of such filters agreed upon by PF and
VF during the acquisition process.
PF's shadow-config is now extended to also contain information about its
VFs' unicast addresses configuration, allowing such enforcement.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:14 +0000 (13:11 +0300)]
qed: Move doorbell calculation from VF to PF
Today, the VF is aware of its queues context-ids, and calculates the
doorbell address when opening its queues on its own.
The configuration of doorbells in HW can sometime in the future be changed
by the PF [hw has several configurable features that might affect doorbell
addresses, e.g., dpm support], this would break compatibility with older
VFs as their calculated doorbell addresses would be incorrect for such a
configuration.
In order to avoid such a backward compatibility failure, let the PF make
the calculation of the doorbell offset based on the context-id, and pass
that to the VF.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:13 +0000 (13:11 +0300)]
qed: Make PF more robust against malicious VF
There are several requests the VF can make toward the PF which the driver
would pass to firmware without checking the validity first - specifically,
opening queues and updating vports. Such configurations might cause the
firmware to assert.
This adds validation of the legality of said configurations on the PF side
before passing it onward via ramrod to firmware.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:12 +0000 (13:11 +0300)]
qed: PF-VF resource negotiation
One of the goals of the vf's first message to the PF [acquire]
is to learn about the number of resources available to it [macs, vlans,
etc.]. This is done via negotiation - the VF requires a set of resources,
which the PF either approves or disaproves and sends a smaller set of
resources as alternative. In this later case, the VF is then expected to
either abort the probe or re-send the acquire message with less
required resources.
While this infrastructure exists since the initial submision of qed
SRIOV support, it's in fact completely inoperational - PF isn't really
looking into the resources the VF has asked for and is never going to
reply to the VF that it lacks resources.
This patch addresses this flow, fixing it and allowing the PF and VF
to actually agree on a set of resources.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 5 Jun 2016 10:11:11 +0000 (13:11 +0300)]
qed: Relax VF firmware requirements
Current driver require an exact match between VF and PF storm firmware;
Any difference would fail the VF acquire message, causing the VF probe
to be aborted.
While there's still dependencies between the two, the recent FW submission
has relaxed the match requirement - instead of an exact match, there's now
a 'fastpath' HSI major/minor scheme, where VFs and PFs that match in their
major number can co-exist even if their minor is different.
In order to accomadate this change some changes in the vf-start init flow
had to be made, as the VF start ramrod now has to be sent only after PF
learns which fastpath HSI its VF is requiring.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 5 Jun 2016 03:02:28 +0000 (20:02 -0700)]
net: get rid of spin_trylock() in net_tx_action()
Note: Tom Herbert posted almost same patch 3 months back, but for
different reasons.
The reasons we want to get rid of this spin_trylock() are :
1) Under high qdisc pressure, the spin_trylock() has almost no
chance to succeed.
2) We loop multiple times in softirq handler, eventually reaching
the max retry count (10), and we schedule ksoftirqd.
Since we want to adhere more strictly to ksoftirqd being waked up in
the future (https://lwn.net/Articles/687617/), better avoid spurious
wakeups.
3) calls to __netif_reschedule() dirty the cache line containing
q->next_sched, slowing down the owner of qdisc.
4) RT kernels can not use the spin_trylock() here.
With help of busylock, we get the qdisc spinlock fast enough, and
the trylock trick brings only performance penalty.
Depending on qdisc setup, I observed a gain of up to 19 % in qdisc
performance (
1016600 pps instead of 853400 pps, using prio+tbf+fq_codel)
("mpstat -I SCPU 1" is much happier now)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 1 Jun 2016 05:56:33 +0000 (01:56 -0400)]
vhost_net: stop polling socket during rx processing
We don't stop rx polling socket during rx processing, this will lead
unnecessary wakeups from under layer net devices (E.g
sock_def_readable() form tun). Rx will be slowed down in this
way. This patch avoids this by stop polling socket during rx
processing. A small drawback is that this introduces some overheads in
light load case because of the extra start/stop polling, but single
netperf TCP_RR does not notice any change. In a super heavy load case,
e.g using pktgen to inject packet to guest, we get about ~8.8%
improvement on pps:
before: ~
1240000 pkt/s
after: ~
1350000 pkt/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Sat, 4 Jun 2016 15:24:00 +0000 (20:54 +0530)]
net: ethernet: cavium: liquidio: request_manager: Remove create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workitem viz
(&db_wq->wk.work which maps to check_db_timeout) is involved
in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
forward progress under memory pressure, which is a requirement here.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary.
flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Sat, 4 Jun 2016 14:51:40 +0000 (20:21 +0530)]
net: ethernet: cavium: liquidio: response_manager: Remove create_workqueue
alloc_workqueue replaces deprecated create_workqueue().
A dedicated workqueue has been used since the workitem viz
(&cwq->wk.work which maps to oct_poll_req_completion) is involved
in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
forward progress under memory pressure, which is a requirement here.
Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary.
flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Fri, 3 Jun 2016 20:57:12 +0000 (16:57 -0400)]
virtio-net: Add initial MTU advice feature
This commit adds the feature bit and associated mtu device entry for the
virtio network device. When a virtio device comes up, it checks the
feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is
enabled, the driver will read the advised MTU and use it as the initial
value.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jun 2016 22:58:34 +0000 (15:58 -0700)]
net: Revert vrf-local changes.
This reverts commit
2fb7ea455d57e22110c54fc2de0656b6f744263c.
It results in build errors because ip6_input is not a
symbol exported to modules.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jun 2016 22:19:07 +0000 (15:19 -0700)]
Merge branch 'vrf-local'
David Ahern says:
====================
net: vrf: Add support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:
$ ip addr show dev red
33: red: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1/32 scope global red
valid_lft forever preferred_lft forever
inet6 1111:1::1/128 scope global
valid_lft forever preferred_lft forever
$ ip addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms
$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms
--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms
$ ping6 -c1 -I red 1111::1
PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms
--- 1111::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms
This change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8
$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 2 Jun 2016 20:15:12 +0000 (13:15 -0700)]
net: vrf: ipv6 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:
$ ip addr show dev eth1
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever
$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms
ip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 2 Jun 2016 20:15:11 +0000 (13:15 -0700)]
net: vrf: ipv4 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.
With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:
$ ip addr show dev eth1
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
valid_lft forever preferred_lft forever
$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
This patch also enables use of IPv4 loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8
$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 2 Jun 2016 20:15:10 +0000 (13:15 -0700)]
net: vrf: Minor refactoring for local address patches
Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions. If the packet is destined to a local
address the header is retained since the packet is sent back to netif_rx.
Collapse vrf_send_v4_prep into vrf_process_v4_outbound.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jun 2016 03:16:50 +0000 (23:16 -0400)]
Merge branch 'hv_netvsc-cleanups'
Vitaly Kuznetsov says:
====================
hv_netvsc: cleanup after untangling the pointer mess
Changes since v1:
- resend when net-next is open [David Miller]
- rebased to current net-next.
After we made traveling through our internal structures explicit it became
obvious that some functions take arguments they don't need just to do
redundant pointer travel and get to what they really need while their
callers already have the required information.
This is just a cleanup series with no functional changes intended. It
doesn't pretend to be complete, additional cleanup of other functions may
follow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:02 +0000 (17:51 +0200)]
hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()
The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:01 +0000 (17:51 +0200)]
hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:51:00 +0000 (17:51 +0200)]
hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}()
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:50:59 +0000 (17:50 +0200)]
hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Fri, 3 Jun 2016 15:50:58 +0000 (17:50 +0200)]
hv_netvsc: remove redundant assignment in netvsc_recv_callback()
net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Fri, 27 May 2016 15:53:52 +0000 (17:53 +0200)]
net: disable fragment reassembly if high_thresh is zero
Before commit
6d7b857d541e ("net: use lib/percpu_counter API for
fragmentation mem accounting"), setting the reassembly high threshold
to 0 prevented fragment reassembly as first fragment would be always
evicted before second could be added to the queue. While inefficient,
some users apparently relied on this method.
Since the commit mentioned above, a percpu counter is used for
reassembly memory accounting and high batch size avoids taking slow path
in most common scenarios. As a result, a whole full sized packet can be
reassembled without the percpu counter's main counter changing its value
so that even with high_thresh set to 0, fragmented packets can be still
reassembled and processed.
Add explicit check preventing reassembly if high threshold is zero.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Jun 2016 01:32:46 +0000 (21:32 -0400)]
Merge branch 'hns-acpi'
Kejian Yan says:
====================
net: hns: add support of ACPI
This series adds HNS support of acpi. The routine will call some ACPI
helper functions, like acpi_dev_found() and acpi_evaluate_dsm(), which
are not included in other cases. In order to make system compile
successfully in other cases except ACPI, it needs to add relative stub
functions to linux/acpi.h. And we use device property functions instead
of serial helper functions to suport both DT and ACPI cases. And then
add the supports of ACPI for HNS.
change log:
v3->v4:
mii-id gets from dev-name instead of address
v2->v3:
1. add Review-by: Andy Shevchenko
2. fix the potential memory leak
v1 -> v2:
1. use acpi_dev_found() instead of acpi_match_device_ids() to check if
it is a acpi node.
2. use is_of_node() instead of IS_ENABLED() to check if it is a DT node.
3. split the patch("add support of acpi for hns-mdio") into two patches:
3.1 Move to use fwnode_handle
3.2 Add ACPI
4. add the patch which subject is dsaf misc operation method
5. fix the comments by Andy Shevchenko
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:21 +0000 (10:55 +0800)]
net: hns: net: hns: enet adds support of acpi
Enet needs to get configration parameter by acpi. This patch
adds support of ACPI for enet. The configuration parameter will
be configed in BIOS.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:20 +0000 (10:55 +0800)]
net: hns: implement the miscellaneous operation by asl
The miscellaneous operation is implemented in BIOS, the kernel can call
_DSM method help to call the implementation in ACPI case. Here is a patch
to do that.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:19 +0000 (10:55 +0800)]
net: hns: register phy device in each mac initial sequence
In ACPI case, there is no interface to register phy device to mdio-bus.
Phy device has to be registered itself to mdio-bus, and then enet can
get the phy device's info so that it can config the phy-device to help
to trasmit and receive data.
HNS hardware topology is as below. The MDIO controller may control several
PHY-devices, and each PHY-device connects to a MAC device. PHY-devices
will register when each mac find PHY device in initial sequence.
cpu
|
|
-------------------------------------------
| | |
| | |
| dsaf |
MDIO | MDIO
| --------------------------- |
| | | | | |
| | | | | |
| MAC MAC MAC MAC |
| | | | | |
---- |-------- |-------- | | --------
|| || || ||
PHY PHY PHY PHY
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:18 +0000 (10:55 +0800)]
net: hns: dsaf adds support of acpi
Dsaf needs to get configuration parameter by ACPI, so this patch add
support of ACPI.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:17 +0000 (10:55 +0800)]
net: hns: add dsaf misc operation method
The misc operation for different hw platform may be different, if using
current implementation, it will add a new branch on each function for
every new hw platform, so we add a method for this operation.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:16 +0000 (10:55 +0800)]
net: hns: add uniform interface for phy connection
As device_node is only used by DT case, HNS needs to treat the other
cases including ACPI. It needs to use uniform ways to handle both of
DT and ACPI. This patch chooses phy_device, and of_phy_connect and
of_phy_attach are only used by DT case. It needs to use uniform interface
to handle that sequence by both DT and ACPI.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:15 +0000 (10:55 +0800)]
net: hns: enet specify a reference to dsaf by fwnode_handle
As device_node is only used by DT case, it is expected to find uniform
ways. So fwnode_handle is the suitable method.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:14 +0000 (10:55 +0800)]
net: hns: use platform_get_irq instead of irq_of_parse_and_map
As irq_of_parse_and_map is only used by DT case, it is excepted to use
a uniform interface. So it is used platform_get_irq() instead.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:13 +0000 (10:55 +0800)]
net: hns: use device_* APIs instead of of_* APIs
OF series functions can be used only for DT case. Use unified
device property function instead to support both DT and ACPI.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:12 +0000 (10:55 +0800)]
net: hisilicon: add support of acpi for hns-mdio
hns-mdio needs to register itself to mii-bus. The info of the device can
be read by both DT and ACPI.
HNS tries to call Linux PHY driver to help access PHY-devices, the HNS
hardware topology is as below. The MDIO controller may control several
PHY-devices, and each PHY-device connects to a MAC device. The MDIO will
be registered to mdiobus, then PHY-devices will register when each mac
find PHY device.
cpu
|
|
-------------------------------------------
| | |
| | |
| dsaf |
MDIO | MDIO
| --------------------------- |
| | | | | |
| | | | | |
| MAC MAC MAC MAC |
| | | | | |
---- |-------- |-------- | | --------
|| || || ||
PHY PHY PHY PHY
And the driver can handle reset sequence by _RST method in DSDT in ACPI
case.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:11 +0000 (10:55 +0800)]
net: hisilicon: cleanup to prepare for other cases
Hns-mdio only supports DT case now. do some cleanup to prepare
for introducing other cases later, no functional change.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:10 +0000 (10:55 +0800)]
ACPI: bus: add stub acpi_evaluate_dsm() to linux/acpi.h
acpi_evaluate_dsm() will be used to handle the _DSM method in ACPI case.
It will be compiled in non-ACPI case, but the function is in acpi_bus.h
and acpi_bus.h can only be used in ACPI case, so this patch add the stub
function to linux/acpi.h to make compiled successfully in non-ACPI cases.
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Fri, 3 Jun 2016 02:55:09 +0000 (10:55 +0800)]
ACPI: bus: add stub acpi_dev_found() to linux/acpi.h
acpi_dev_found() will be used to detect if a given ACPI device is in the
system. It will be compiled in non-ACPI case, but the function is in
acpi_bus.h and acpi_bus.h can only be used in ACPI case, so this patch add
the stub function to linux/acpi.h to make compiled successfully in
non-ACPI cases.
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Jun 2016 21:29:55 +0000 (14:29 -0700)]
Merge branch 'dsa-new-binding'
Andrew Lunn says:
====================
New DSA bind, switches as devices
The interesting patches here are the last three. They implement a new
binding for DSA, which removes a few limitations of the current DSA
binding. In particular, it allows switches to be true Linux devices.
These devices can be on any type of bus, unlike the old DSA binding
which assumes MDIO. See the commit log for more details. The second to
last patch modifies an existing boards device tree to use the new
binding, giving a good example of how switches can be true MDIO
devices. The last patch documents the new binding.
Thanks go to Florian and Vivien for reviewing, testing and bug fixing
these patches.
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Since V1:
* Add lots of reviewed-by's
* Fix rtable comment
* dsa2: Clear cpu port mask in dsa_cpu_port_unapply()
* dsa2: Only set dsa_port_mask when port successfully configured
* dsa: clear {dsa|cpu}_port_mask on destroy
Since RFC:
* Split the mv88e6xxx MDIO refactor into a rename patch and a refactor
patch.
* Extend commit message with comment about wrong of_node_put()
* Fix destroy of cpu and dsa ports.
* Rename _DSA_TAG_LAST to DSA_TAG_LAST and add a comment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:09 +0000 (21:17 +0200)]
net: dsa: Document new binding
Add the new binding to the documentation of the existing binding.
Mark the old binding as deprecated.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:08 +0000 (21:17 +0200)]
arm: dt: vf610-zii-devel-b: Make use of new DSA binding
Hang the three switches of the three MDIO busses using the new DSA
binding. Also, make use of the mdio-bus and explicitly list the phys
on one device. This is not required, but good for testing.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:07 +0000 (21:17 +0200)]
net: dsa: Add new binding implementation
The existing DSA binding has a number of limitations and problems. The
main problem is that it cannot represent a switch as a linux device,
hanging off some bus. It is limited to one CPU port. The DSA platform
device is artificial, and does not really represent hardware.
Implement a new binding which can be embedded into any type of node on
a bus to represent one switch device, and its links to other switches.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:06 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Refactor MDIO so driver registers mdio bus
Have the switch driver register its own MDIO bus. This allows for an
mdio property in the device tree, with child nodes for phys, which
can be referenced via phandles, etc.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:05 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Rename _phy_ to _mdio_
The switch implements a generic MDIO bus, which could host more than
PHYs. It is conventional to use _mdio_ or _mii_ in the function name,
so rename them. Also postfix make the historically first read/write
function with _direct, to help distinguish it from _indirect and _ppu.
While touching these functions, remove some of the _ prefixes, which
we are deprecating.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:04 +0000 (21:17 +0200)]
net: dsa: Make mdio bus optional
The switch may want to instantiate its own MDIO bus. Only do it
centrally if the switch has not already created one, and the read op
is implemented.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:03 +0000 (21:17 +0200)]
net: dsa: Refactor selection of tag ops into a function
Replace the two switch statements with an array lookup, and store the
result in the dsa tree structure. The drivers no longer need to know
the selected tag protocol, so remove it from the dsa switch structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:02 +0000 (21:17 +0200)]
net: dsa: mv88e6xxx: Only support EDSA tagging
The merged driver no longer offers the option to use DSA tagging. So
remove the code to setup the switch to do DSA tagging and hard code
the use of EDSA.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>y
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:01 +0000 (21:17 +0200)]
net: dsa: Split up creating/destroying of DSA and CPU ports
Refactor the code to setup a single DSA/CPU port into a function of
its own, and export it, so it can be used by the new binding.
Similarly, refactor the destroy code into a function. When destroying
the ports, don't put the of node. They should be released at the end
along with the normal ports.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:17:00 +0000 (21:17 +0200)]
net: dsa: Copy the routing table into the switch structure
The new binding will not have a chip data structure, it will place the
routing directly into the switch structure. To enable backwards
compatibility, copy the routing from the chip data into the switch
structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:59 +0000 (21:16 +0200)]
net: dsa: Remove dynamic allocate of routing table
With a maximum of four switches, the size of the routing table is the
same as the pointer to it. Removing it makes the code simpler.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:58 +0000 (21:16 +0200)]
net: dsa: Move port device node into port structure
Move the port device node structure into the port structure, from the
chip data. This information is needed in the next step of implementing
the new binding.
The chip data structure is used while parsing the whole old binding,
before the individual switch structures exist. With the new bindings,
this is reversed, the switches exist first, and the interconnections
between the switches is derived from the individual switch
bindings. Thus this chip data structure becomes unneeded.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
eviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:57 +0000 (21:16 +0200)]
net: dsa: Add a ports structure and use it in the switch structure
There are going to be more per-port members added to the switch
structure. So add a port structure and move the netdev into it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:56 +0000 (21:16 +0200)]
net: dsa: tag_{e}dsa.c: Remove dependency on platform data
The platform data nr_chips is used when validating a received packet,
to ensure it comes from a know switch chip. The number of possible
switches is limited to DSA_MAX_SWITCHES, so use this as the first
validation step. The new binding allows holes in the dst->ds[] array,
so also ensure ensure there is a valid dsa_switch for this packet.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:55 +0000 (21:16 +0200)]
net: dsa: slave: Remove MDIO address from switch MDIO bus name
The DSA layer should no longer assume the switch is connected to an
MDIO bus. As a result, we cannot use the address on the MDIO bus when
forming the name of the switches internal MDIO bus for its builtin and
possibly external PHYs. The switch index is sufficient to make the
name unique, so drop the MDIO address.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Sat, 4 Jun 2016 19:16:54 +0000 (21:16 +0200)]
net: dsa: mv88e6xxx: fix circular lock in PPU work
Lock debugging shows that there is a possible circular lock in the PPU
work code. Switch the lock order of smi_mutex and ppu_mutex to fix this.
Here's the full trace:
[ 4.341325] ======================================================
[ 4.347519] [ INFO: possible circular locking dependency detected ]
[ 4.353800] 4.6.0 #4 Not tainted
[ 4.357039] -------------------------------------------------------
[ 4.363315] kworker/0:1/328 is trying to acquire lock:
[ 4.368463] (&ps->smi_mutex){+.+.+.}, at: [<
8049c758>] mv88e6xxx_reg_read+0x30/0x54
[ 4.376313]
[ 4.376313] but task is already holding lock:
[ 4.382160] (&ps->ppu_mutex){+.+...}, at: [<
8049cac0>] mv88e6xxx_ppu_reenable_work+0x28/0xd4
[ 4.390772]
[ 4.390772] which lock already depends on the new lock.
[ 4.390772]
[ 4.398963]
[ 4.398963] the existing dependency chain (in reverse order) is:
[ 4.406461]
[ 4.406461] -> #1 (&ps->ppu_mutex){+.+...}:
[ 4.410897] [<
806d86bc>] mutex_lock_nested+0x54/0x360
[ 4.416606] [<
8049a800>] mv88e6xxx_ppu_access_get+0x28/0x100
[ 4.422906] [<
8049b778>] mv88e6xxx_phy_read+0x90/0xdc
[ 4.428599] [<
806a4534>] dsa_slave_phy_read+0x3c/0x40
[ 4.434300] [<
804943ec>] mdiobus_read+0x68/0x80
[ 4.439481] [<
804939d4>] get_phy_device+0x58/0x1d8
[ 4.444914] [<
80493ed0>] mdiobus_scan+0x24/0xf4
[ 4.450078] [<
8049409c>] __mdiobus_register+0xfc/0x1ac
[ 4.455857] [<
806a40b0>] dsa_probe+0x860/0xca8
[ 4.460934] [<
8043246c>] platform_drv_probe+0x5c/0xc0
[ 4.466627] [<
804305a0>] driver_probe_device+0x118/0x450
[ 4.472589] [<
80430b00>] __device_attach_driver+0xac/0x128
[ 4.478724] [<
8042e350>] bus_for_each_drv+0x74/0xa8
[ 4.484235] [<
804302d8>] __device_attach+0xc4/0x154
[ 4.489755] [<
80430cec>] device_initial_probe+0x1c/0x20
[ 4.495612] [<
8042f620>] bus_probe_device+0x98/0xa0
[ 4.501123] [<
8042fbd0>] deferred_probe_work_func+0x4c/0xd4
[ 4.507328] [<
8013a794>] process_one_work+0x1a8/0x604
[ 4.513030] [<
8013ac54>] worker_thread+0x64/0x528
[ 4.518367] [<
801409e8>] kthread+0xec/0x100
[ 4.523201] [<
80108f30>] ret_from_fork+0x14/0x24
[ 4.528462]
[ 4.528462] -> #0 (&ps->smi_mutex){+.+.+.}:
[ 4.532895] [<
8015ad5c>] lock_acquire+0xb4/0x1dc
[ 4.538154] [<
806d86bc>] mutex_lock_nested+0x54/0x360
[ 4.543856] [<
8049c758>] mv88e6xxx_reg_read+0x30/0x54
[ 4.549549] [<
8049cad8>] mv88e6xxx_ppu_reenable_work+0x40/0xd4
[ 4.556022] [<
8013a794>] process_one_work+0x1a8/0x604
[ 4.561707] [<
8013ac54>] worker_thread+0x64/0x528
[ 4.567053] [<
801409e8>] kthread+0xec/0x100
[ 4.571878] [<
80108f30>] ret_from_fork+0x14/0x24
[ 4.577139]
[ 4.577139] other info that might help us debug this:
[ 4.577139]
[ 4.585159] Possible unsafe locking scenario:
[ 4.585159]
[ 4.591093] CPU0 CPU1
[ 4.595631] ---- ----
[ 4.600169] lock(&ps->ppu_mutex);
[ 4.603693] lock(&ps->smi_mutex);
[ 4.609742] lock(&ps->ppu_mutex);
[ 4.615790] lock(&ps->smi_mutex);
[ 4.619314]
[ 4.619314] *** DEADLOCK ***
[ 4.619314]
[ 4.625256] 3 locks held by kworker/0:1/328:
[ 4.629537] #0: ("events"){.+.+..}, at: [<
8013a704>] process_one_work+0x118/0x604
[ 4.637288] #1: ((&ps->ppu_work)){+.+...}, at: [<
8013a704>] process_one_work+0x118/0x604
[ 4.645653] #2: (&ps->ppu_mutex){+.+...}, at: [<
8049cac0>] mv88e6xxx_ppu_reenable_work+0x28/0xd4
[ 4.654714]
[ 4.654714] stack backtrace:
[ 4.659098] CPU: 0 PID: 328 Comm: kworker/0:1 Not tainted 4.6.0 #4
[ 4.665286] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[ 4.671748] Workqueue: events mv88e6xxx_ppu_reenable_work
[ 4.677174] Backtrace:
[ 4.679674] [<
8010d354>] (dump_backtrace) from [<
8010d5a0>] (show_stack+0x20/0x24)
[ 4.687252] r6:
80fb3c88 r5:
80fb3c88 r4:
80fb4728 r3:
00000002
[ 4.693003] [<
8010d580>] (show_stack) from [<
803b45e8>] (dump_stack+0x24/0x28)
[ 4.700246] [<
803b45c4>] (dump_stack) from [<
80157398>] (print_circular_bug+0x208/0x32c)
[ 4.708361] [<
80157190>] (print_circular_bug) from [<
8015a630>] (__lock_acquire+0x185c/0x1b80)
[ 4.716982] r10:
9ec22a00 r9:
00000060 r8:
8164b6bc r7:
00000040 r6:
00000003 r5:
8163a5b4
[ 4.724905] r4:
00000003 r3:
9ec22de8
[ 4.728537] [<
80158dd4>] (__lock_acquire) from [<
8015ad5c>] (lock_acquire+0xb4/0x1dc)
[ 4.736378] r10:
60000013 r9:
00000000 r8:
00000000 r7:
00000000 r6:
9e5e9c50 r5:
80e618e0
[ 4.744301] r4:
00000000
[ 4.746879] [<
8015aca8>] (lock_acquire) from [<
806d86bc>] (mutex_lock_nested+0x54/0x360)
[ 4.754976] r10:
9e5e9c1c r9:
80e616c4 r8:
9f685ea0 r7:
0000001b r6:
9ec22a00 r5:
8163a5b4
[ 4.762899] r4:
9e5e9c1c
[ 4.765477] [<
806d8668>] (mutex_lock_nested) from [<
8049c758>] (mv88e6xxx_reg_read+0x30/0x54)
[ 4.774008] r10:
80e60c5b r9:
80e616c4 r8:
9f685ea0 r7:
0000001b r6:
00000004 r5:
9e5e9c10
[ 4.781930] r4:
9e5e9c1c
[ 4.784507] [<
8049c728>] (mv88e6xxx_reg_read) from [<
8049cad8>] (mv88e6xxx_ppu_reenable_work+0x40/0xd4)
[ 4.793907] r7:
9ffd5400 r6:
9e5e9c68 r5:
9e5e9cb0 r4:
9e5e9c10
[ 4.799659] [<
8049ca98>] (mv88e6xxx_ppu_reenable_work) from [<
8013a794>] (process_one_work+0x1a8/0x604)
[ 4.809059] r9:
80e616c4 r8:
9f685ea0 r7:
9ffd5400 r6:
80e0a1c8 r5:
9f5f2e80 r4:
9e5e9cb0
[ 4.816910] [<
8013a5ec>] (process_one_work) from [<
8013ac54>] (worker_thread+0x64/0x528)
[ 4.825010] r10:
9f5f2e80 r9:
00000008 r8:
80e0dc80 r7:
80e0a1fc r6:
80e0a1c8 r5:
9f5f2e98
[ 4.832933] r4:
80e0a1c8
[ 4.835510] [<
8013abf0>] (worker_thread) from [<
801409e8>] (kthread+0xec/0x100)
[ 4.842827] r10:
00000000 r9:
00000000 r8:
00000000 r7:
8013abf0 r6:
9f5f2e80 r5:
9ec15740
[ 4.850749] r4:
00000000
[ 4.853327] [<
801408fc>] (kthread) from [<
80108f30>] (ret_from_fork+0x14/0x24)
[ 4.860557] r7:
00000000 r6:
00000000 r5:
801408fc r4:
9ec15740
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 4 Jun 2016 19:16:52 +0000 (21:16 +0200)]
net: dsa: slave: chip data is optional, don't dereference NULL
The new binding does not make use of dsa_chip_data, a.k.a cd. When
retrieving the size of the EEPROM attached to a switch, don't assume
there is a cd attached to the switch structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Jun 2016 05:56:28 +0000 (22:56 -0700)]
net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Jun 2016 05:53:26 +0000 (22:53 -0700)]
sctp: Fix warning in sctp_packet_transmit_chunk()
size_t objects should be printed with %Z printf format.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sat, 4 Jun 2016 05:20:16 +0000 (08:20 +0300)]
qed: Fix next-ptr chains for BE / 32-bit
Commit
a91eb52abb50 ("qed: Revisit chain implementation") contains an
incorrect implementation for BE platforms, as device's regpairs containing
addresses are LE and they're not converted correctly when read back.
In addition, it raises a compilation warning for 32-bit platforms where
dma_addr_t is a 32-bit variable.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Jun 2016 00:08:45 +0000 (20:08 -0400)]
Merge branch 'qed-roce-iscsi'
Yuval Mintz says:
====================
qed: RocE & iSCSI infrastructure
We plan on sending 2 new protocol drivers in the imminent future -
both our RoCE [qedr] and iSCSI [qedi] drivers. As both submissions
would be rather massive and in order to avoid collisions between them,
the common infrastructure on the qed side was prepared as an independent
patch-series to be sent ahead of those 2 submissions.
This patch series introduces in QED 2 new 'ids' - one for iscsi and
one for roce. It then goes and adds logic required for configuring
said protocols in HW. Notice it *doesn't* actually add any client using
said ids, but rather only the infrastructure to allow their later usage.
What this patch doesn't contain is the slowpath protocol-configuration
toward the firmware. I.e., it contains register-setting logic, memory
allocations, etc., but not actual flow-related configuration specific
to the protocl. Those would be sent as part of the protocol driver
submissions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 3 Jun 2016 11:35:35 +0000 (14:35 +0300)]
qed: Initialize hardware for new protocols
RoCE and iSCSI would require some added/changed hw configuration in order
to properly run; The biggest single change being the requirement of
allocating and mapping host memory for several HW blocks that aren't being
used by qede [SRC, QM, TM, etc.].
In addition, whereas qede is only using context memory for HW blocks, the
new protocol would also require task memories to be added.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 3 Jun 2016 11:35:34 +0000 (14:35 +0300)]
qed: Add iscsi/rdma personalities
This patch adds in the ecore 2 new personalities in addition to
QED_PCI_ETH - QED_PCI_ISCSI and QED_PCI_ETH_ROCE.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 3 Jun 2016 11:35:33 +0000 (14:35 +0300)]
qed: Add common HSI for new protocols
This adds the qed portion of the RoCE & iSCSI firmware HSI,
as well as adding several new common HSI files which would be required
by both qed and qed* protocols.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 3 Jun 2016 11:35:32 +0000 (14:35 +0300)]
qed: Revisit chain implementation
RoCE driver is going to need a 32-bit chain [current chain implementation
for qed* currently supports only 16-bit producer/consumer chains].
This patch adds said support, as well as doing other slight tweaks and
modifications to qed's chain API.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Thu, 2 Jun 2016 22:37:08 +0000 (01:37 +0300)]
net: ethernet: ti: cpsw: remove unused priv lock
There is no reason in this lock. At least for now.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 2 Jun 2016 19:08:52 +0000 (12:08 -0700)]
rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB
Use the more common kernel logging style and reduce object size.
The logging message prefix changes from a mixture of
"RxRPC:" and "RXRPC:" to "af_rxrpc: ".
$ size net/rxrpc/built-in.o*
text data bss dec hex filename
64172 1972 8304 74448 122d0 net/rxrpc/built-in.o.new
67512 1972 8304 77788 12fdc net/rxrpc/built-in.o.old
Miscellanea:
o Consolidate the ASSERT macros to use a single pr_err call with
decimal and hexadecimal output and a stringified #OP argument
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 2 Jun 2016 19:02:04 +0000 (12:02 -0700)]
hv_netvsc: Fix VF register on vlan devices
Added a condition to avoid vlan devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Jun 2016 23:37:26 +0000 (19:37 -0400)]
Merge branch 'sctp-gso'
Marcelo Ricardo Leitner says:
====================
sctp: Add GSO support
This patchset adds sctp GSO support.
Performance tests indicates that increases throughput by 10% if using
bigger chunk sizes, specially if bigger than MTU. For small chunks, it
doesn't help much if not using heavy firewall rules.
For small chunks it will probably be of more use once we get something
like MSG_MORE as David Laight had suggested.
overall changes:
v1->v2:
Added support for receiving GSO frames on SCTP stack, as requested by
Dave Miller.
v2->v3:
Consider sctphdr size in skb_gso_transport_seglen()
rebased due to
5c7cdf339af5 ("gso: Remove arbitrary checks for
unsupported GSO")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:44 +0000 (15:05 -0300)]
sctp: improve debug message to also log curr pkt and new chunk size
This is useful for debugging packet sizes.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:43 +0000 (15:05 -0300)]
sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.
This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.
With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.
This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.
v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to
5c7cdf339af5 ("gso: Remove arbitrary checks for
unsupported GSO")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:42 +0000 (15:05 -0300)]
sctp: delay as much as possible skb_linearize
This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.
This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.
One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.
With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:41 +0000 (15:05 -0300)]
skbuff: introduce skb_gso_validate_mtu
skb_gso_network_seglen is not enough for checking fragment sizes if
skb is using GSO_BY_FRAGS as we have to check frag per frag.
This patch introduces skb_gso_validate_mtu, based on the former, which
will wrap the use case inside it as all calls to skb_gso_network_seglen
were to validate if it fits on a given TMU, and improve the check.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:40 +0000 (15:05 -0300)]
sk_buff: allow segmenting based on frag sizes
This patch allows segmenting a skb based on its frags sizes instead of
based on a fixed value.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Thu, 2 Jun 2016 18:05:39 +0000 (15:05 -0300)]
skbuff: export skb_gro_receive
sctp GSO requires it and sctp can be compiled as a module, so we need to
export this function.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>