Brenden Blanco [Tue, 19 Jul 2016 19:16:46 +0000 (12:16 -0700)]
bpf: add bpf_prog_add api for bulk prog refcnt
A subsystem may need to store many copies of a bpf program, each
deserving its own reference. Rather than requiring the caller to loop
one by one (with possible mid-loop failure), add a bulk bpf_prog_add
api.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Jul 2016 03:49:18 +0000 (20:49 -0700)]
Merge branch 'ncsi'
Gavin Shan says:
====================
NCSI Support
This series rebases on David's linux-net git repo ("master" branch). It's
to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The
implementation is based on NCSI spec (version: 1.1.0):
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
As the following figure shows and defined in NCSI spec:
* The NC-SI (aka NCSI) is defined as the interface between a (Base)
Management Controller (BMC) and one or multiple Network Interface
Controlers (NIC) on host side. The interface is responsible for providing
external network connectivity for BMC.
* Each BMC can connect to multiple packages, up to 8. Each package can have
multiple channels, up to 32. Every package and channel are identified by
3-bits and 5-bits in NCSI packet.
* NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol
field. The destination MAC address should be 0xFF's while the source MAC
address can be arbitrary one.
* NCSI packets are classified to command, response, AEN (Asynchronous Event Notification).
Commands are sent from BMC to host (NIC) for configuration and
information retrival. Responses, corresponding to commands, are sent from
host to BMC for confirmation and requested information. One command should
have one and only one response. AEN is sent from host to BMC for notification
(e.g. link down on active channel) so that BMC can take appropriate action.
+------------------+ +----------------------------------------------+
| | | Host |
| BMC | | |
| | | +-------------------+ +-------------------+ |
| +---------+ | | | Package-A | | Package-B | |
| | | | | +---------+---------+ +-------------------+ |
| |ftgmac100| | | | Channel | Channel | | Channel | Channel | |
+----+----+----+---+ +-+---------+---------+--+---------+---------+-+
| | |
| | |
+-----------------------------+----------------------+
The series of patches is highlighted as:
The design for the patchset is highlighted as below:
* The network driver uses 3 interfaces exported from NCSI stack:
ncsi_register_dev() - Register (create) a associated NCSI device.
ncsi_start_dev() - Bring up the NCSI device.
ncsi_unregister_dev() - Destroy the registered NCSI device.
* There are several data structures introduced for different objects:
struct ncsi_dev - NCSI device seen by network device driver.
struct ncsi_dev_priv - NCSI device seen by NCSI stack.
struct ncsi_package - NCSI package which can have multiple channels.
struct ncsi_channel - NCSI channel.
* The NCSI stack is driven by workqueue and state machine internally.
* The all available NCSI packages and channels are enumerated (probed) on
the first call to ncsi_start_dev(). The NCSI topology won't change until
the NCSI device is destroyed.
* All available channels will be brought up When the hardware arbitration
is enabled. Otherwise, only one channel is selected as active one. The
NCSI internal is driven by state machine with help of a workqueue. In
the meanwhile, there are 3 states for each channel which can be put into
a queue requesting for configuration or suspending. Channels in the queue
with inactive state set will be configured (bringup) while channels in
the queue with active state will be suspended (teardown). The request
configuration or suspending is being applied on the channel if it's in
invisible state.
* Failover, another inactive channel is selected as active, can happen when
the hardware arbitration is disabled. The failover can be caused by timeout
on link monitor and AEN.
* NCSI stack should be configurable through netlink or another mechanism, it's
not implemented in this patchset. It's something TBD.
* The first NIC driver that is aware of NCSI: drivers/net/ethernet/faraday/ftgmac100.c
Changelog
=========
v2 -> v3:
* Include (one line) change in include/uapi/linux/if_ether.h to fix build
error.
v1 -> v2:
* Support NCSI spec v1.1.0 (3 more commands and 4 hardware arbitration
modes added).
* Enable AEN packets according to the supported list.
* Introduce NCSI channel states and processing queue in order to support
the hardware arbitration.
* The hardware arbitration is supported (tested with emulated environment).
* Introduce link monitor with GLS (Get Link Status) command/response as part
of the error handling defined in NCSI spec.
* Support IPv6 address discovery when CONFIG_IPV6 is enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:25 +0000 (11:54 +1000)]
net/faraday: Mask PHY interrupt with NCSI mode
Bogus PHY interrupts are observed. This masks the PHY interrupt
when the interface works in NCSI mode as there is no attached
PHY under the circumstance.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:24 +0000 (11:54 +1000)]
net/faraday: Match driver according to compatible property
This matches the driver with devices compatible with "faraday,ftgmac100"
declared in the device tree. Originally, device's name from device
tree for it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:23 +0000 (11:54 +1000)]
net/faraday: Support NCSI mode
This makes ftgmac100 driver support NCSI mode. The NCSI is enabled
on the interface if property "use-nc-si" or "use-ncsi" is found from
the device node in device tree.
* No PHY device is used when NCSI mode is enabled.
* The NCSI device (struct ncsi_dev) is created when probing the
device while it's enabled/started when the interface is brought
up.
* Hardware IP checksum dosn't work when NCSI mode is enabled. It
is disabled on enabled NCSI.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:22 +0000 (11:54 +1000)]
net/faraday: Read MAC address from chip
The device is assigned with random MAC address. It isn't reasonable.
An valid MAC address might have been provided by (uboot) firmware by
device-tree or in chip. It's reasonable to use it to maintain consistency.
This uses the MAC address from device-tree or that in the chip if it's
valid. Otherwise, a random MAC address is given as before.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:21 +0000 (11:54 +1000)]
net/faraday: Helper functions to create or destroy MDIO interface
This introduces two helper functions to create or destroy MDIO
interface. No logical changes introduced except the proper MDIO
names are given when having more than one MDIO bus.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:20 +0000 (11:54 +1000)]
net/ncsi: NCSI AEN packet handler
This introduces NCSI AEN packet handlers that result in (A) the
currently active channel is reconfigured; (B) Currently active
channel is deconfigured and disabled, another channel is chosen
as active one and configured. Case (B) won't happen if hardware
arbitration has been enabled, the channel that was in active
state is suspended simply.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:19 +0000 (11:54 +1000)]
net/ncsi: Package and channel management
This manages NCSI packages and channels:
* The available packages and channels are enumerated in the first
time of calling ncsi_start_dev(). The channels' capabilities are
probed in the meanwhile. The NCSI network topology won't change
until the NCSI device is destroyed.
* There in a queue in every NCSI device. The element in the queue,
channel, is waiting for configuration (bringup) or suspending
(teardown). The channel's state (inactive/active) indicates the
futher action (configuration or suspending) will be applied on the
channel. Another channel's state (invisible) means the requested
action is being applied.
* The hardware arbitration will be enabled if all available packages
and channels support it. All available channels try to provide
service when hardware arbitration is enabled. Otherwise, one channel
is selected as the active one at once.
* When channel is in active state, meaning it's providing service, a
timer started to retrieve the channe's link status. If the channel's
link status fails to be updated in the determined period, the channel
is going to be reconfigured. It's the error handling implementation
as defined in NCSI spec.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:18 +0000 (11:54 +1000)]
net/ncsi: NCSI response packet handler
The NCSI response packets are sent to MC (Management Controller)
from the remote end. They are responses of NCSI command packets
for multiple purposes: completion status of NCSI command packets,
return NCSI channel's capability or configuration etc.
This defines struct to represent NCSI response packets and introduces
function ncsi_rcv_rsp() which will be used to receive NCSI response
packets and parse them.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:17 +0000 (11:54 +1000)]
net/ncsi: NCSI command packet handler
The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.
This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Tue, 19 Jul 2016 01:54:16 +0000 (11:54 +1000)]
net/ncsi: Resource management
NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.
* The user (e.g. netdev driver) dereference NCSI device by
"struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
The later one is used by NCSI stack internally.
* Every NCSI device can have multiple packages simultaneously, up
to 8 packages. It's represented by "struct ncsi_package" and
identified by 3-bits ID.
* Every NCSI package can have multiple channels, up to 32. It's
represented by "struct ncsi_channel" and identified by 5-bits ID.
* Every NCSI channel has version, statistics, various modes and
filters. They are represented by "struct ncsi_channel_version",
"struct ncsi_channel_stats", "struct ncsi_channel_mode" and
"struct ncsi_channel_filter" separately.
* Apart from AEN (Asynchronous Event Notification), the NCSI stack
works in terms of command and response. This introduces "struct
ncsi_req" to represent a complete NCSI transaction made of NCSI
request and response.
link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Jul 2016 02:42:02 +0000 (19:42 -0700)]
Merge branch 'dsa-mv88e6xxx-g2-cleanup-stp'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: Global2 cleanup and STP
The Marvell switches registers are organized in distinct internal SMI
devices, such as PHY, Port, Global 1 or Global 2 registers sets.
Since not all chips support every registers sets or have slightly
differences in them (such as old
88E6060 or new
88E6390 likely to be
supported soon), make the setup code clearer now by removing a few
family checks and adding flags to describe the Global 2 registers map.
This patchset enables basic STP support and bridging on most chips when
getting rid of a few inconsistencies in chip descriptions (patch 1) and
add bridge Ageing Time support to DSA and the mv88e6xxx driver.
Changes v2 -> v3:
- rename mv88e6xxx_update_write to mv88e6xxx_update
- set fastest ageing time in use in the chip for multiple bridges,
tested with a few printk
Changes v1 -> v2:
- add a write helper for pointer-data Update registers
- add ageing time support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:40 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add support for DSA ageing time
Implement the DSA driver function to configure the bridge ageing time.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:39 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add G1 helper for ageing time
All Marvell switch chips from (
88E6060 to
88E6390) have a ATU Control
register containing bits 11:4 to configure an ATU Age Time quotient.
However the coefficient used to calculate the ATU Age Time vary with the
models. E.g.
88E6060,
88E6352 and
88E6390 use respectively 16, 15 and
3.75 seconds.
Add a age_time_coeff to the info structure to handle this and a Global 1
helper to set the default age time of 5 minutes in the setup code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:38 +0000 (20:45 -0400)]
net: dsa: support switchdev ageing time attr
Add a new function for DSA drivers to handle the switchdev
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
The ageing time is passed as milliseconds.
Also because we can have multiple logical bridges on top of a physical
switch and ageing time are switch-wide, call the driver function with
the fastest ageing time in use on the chip instead of the requested one.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:37 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add cap for IRL
Add capability flags to describe the presence of Ingress Rate Limit unit
registers and an helper function to clear it.
In the meantime, fix a few harmless issues:
- 6185 and 6095 don't have such registers (reserved)
- the previous code didn't wait for the IRL operation to complete
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:36 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add cap for Priority Override
Add flags and helpers to describe the presence of Priority Override
Table (POT) related registers and simplify the setup of Global 2.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:35 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add cap for PVT
Add flags to describe the presence of Cross-chip Port VLAN Table (PVT)
related registers and simplify the setup of Global 2.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:34 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: rework Switch MAC setter
Switches such as
88E6185 as 3 Switch MAC registers in Global 1. Newer
chips such as
88E6352 have freed these registers in favor of an indirect
access in a Switch MAC/WoL/WoF register in Global 2.
Explicit this difference with G1 and G2 helpers and flags.
Also, note that this indirect access is a single-register which doesn't
require to wait for the operation to complete (like Switch MAC, Trunk
Mapping, etc.), in contrary to multi-registers indirect accesses with
several operations and a busy bit.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:33 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: add cap for MGMT Enables bits
Some switches provide a Rsvd2CPU mechanism used to choose which of the
16 reserved multicast destination addresses matching 01:80:c2:00:00:0x
should be considered as MGMT and thus forwarded to the CPU port.
Other switches extend this mechanism to also configure as MGMT the
additional 16 reserved multicast addresses matching 01:80:c2:00:00:2x.
This mechanism is exposed via two registers in Global 2, and an Rsvd2CPU
enable bit in the management register.
Newer chip (such as
88E6390) has replaced these registers with a new
indirect MGMT mechanism in Global 1.
The patch adds two MV88E6XXX_FLAG_G2_MGMT_EN_{0,2}X flags to describe
the presence of these Global 2 registers. If
88E6390 support is added, a
MV88E6XXX_FLAG_G1_MGMT_CTRL flag will be needed to setup Rsvd2CPU.
Note: all switches still support in parallel the ATU Load operation with
an MGMT Entry State to forward such frames in a less convenient way.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:32 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: extract trunk mapping
The Trunk Mask and Trunk Mapping registers are two Global 2 indirect
accesses to trunking configuration.
Add helpers for these tables and simplify the Global 2 setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:31 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: extract device mapping
The Device Mapping register is an indirect table access.
Provide helpers to access this table and explicit the checking of the
new DSA_RTABLE_NONE routing table value.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:30 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: split setup of Global 1 and 2
Separate the setup of Global 1 and Global 2 internal SMI devices and add
a flag to describe the presence of this second registers set.
Also rearrange the G1 setup in the registers order.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Jul 2016 00:45:29 +0000 (20:45 -0400)]
net: dsa: mv88e6xxx: remove basic function flags
All 88E6xxx Marvell switches (even the old not supported yet
88E6060)
have at least an ATU, per-port STP states and VLAN map, to run basic
switch functions such as Spanning Tree and port based VLANs.
Get rid of the related MV88E6XXX_FLAG_{ATU,PORTSTATE,VLANTABLE} flags,
as they are defaults to every chip.
This enables STP on 6185 and removes many inconsistencies on others.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Mon, 18 Jul 2016 22:50:58 +0000 (15:50 -0700)]
kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug
kernel/trace/bpf_trace.c: In function 'bpf_event_output':
kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in initializer
kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
kernel/trace/bpf_trace.c:312: warning: (near initialization for 'raw.frag.<anonymous>')
Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event output")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Lutomirski [Mon, 18 Jul 2016 22:34:49 +0000 (15:34 -0700)]
virtio-net: Remove more stack DMA
VLAN and MQ control was doing DMA from the stack. Fix it.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 18 Jul 2016 20:02:47 +0000 (13:02 -0700)]
bnxt_en: Remove locking around txr->dev_state
txr->dev_state was not consistently manipulated with the acquisition of
the per-queue lock, after further inspection the lock does not seem
necessary, either the value is read as BNXT_DEV_STATE_CLOSING or 0.
Reported-by: coverity (CID 1339583)
Fixes: c0c050c58d840 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Jul 2016 23:40:22 +0000 (16:40 -0700)]
Merge branch 'frag-udp-tunneled-skbs'
Shmulik Ladkani says:
====================
net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'
Currently IP fragmentation of GSO segments that exceed dst mtu is
considered only in the ipv4 forwarding case.
There are cases where GSO skbs that are bridged and then udp-tunneled
may have gso_size exceeding the egress device mtu.
It makes sense to fragment them, as in the non GSOed code path.
The exact cases where this behavior is needed is described and addressed
in the 2nd patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Mon, 18 Jul 2016 11:49:34 +0000 (14:49 +0300)]
net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
Given:
- tap0 and vxlan0 are bridged
- vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)
Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).
After encapsulation these skbs have skb_gso_network_seglen that exceed
eth0's ip_skb_dst_mtu.
These skbs are accidentally passed to ip_finish_output2 AS IS.
Alas, each final segment (segmented either by validate_xmit_skb or by
hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain networks.
This behavior is not aligned with the NON-GSO case:
Assume a non-gso 1500-sized IP packet arrives from tap0. After
encapsulation, the vxlan datagram is fragmented normally at the
ip_finish_output-->ip_fragment code path.
The expected behavior for the GSO case would be segmenting the
"gso-oversized" skb first, then fragmenting each segment according to
dst mtu, and finally passing the resulting fragments to ip_finish_output2.
'ip_finish_output_gso' already supports this "Slowpath" behavior,
according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
forwarding (not set in the bridged case).
In order to support the bridged case, we'll mark skbs arriving from an
ingress interface that get udp-encaspulated as "allowed to be fragmented",
causing their network_seglen to be validated by 'ip_finish_output_gso'
(and fragment if needed).
Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
gso and non-gso cases), which serves users wishing to forbid
fragmentation at the udp tunnel endpoint.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Mon, 18 Jul 2016 11:49:33 +0000 (14:49 +0300)]
net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags
This flag indicates whether fragmentation of segments is allowed.
Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
either ip_forward or ipmr_forward).
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Jul 2016 23:29:41 +0000 (16:29 -0700)]
Merge branch 'bnxt_en-NS2-Nitro'
Michael Chan says:
====================
bnxt_en: Add support for NS2 Nitro.
This series adds support for the embedded version of the
ethernet controller (Nitro) in the North Star 2 SoC. There are a number
of features not supported and a software workaround for a hardware rx
bug is required for Nitro A0. Please review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:25 +0000 (07:15 -0400)]
bnxt_en: Add BCM58700 PCI device ID for NS2 Nitro.
A bridge device in NS2 has the same device ID as the ethernet controller.
Add check to avoid probing the bridge device.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:24 +0000 (07:15 -0400)]
bnxt_en: Workaround Nitro A0 RX hardware bug (part 4).
Allocate special vnic for dropping packets not matching the RX filters.
First vnic is for normal RX packets and the driver will drop all
packets on the 2nd vnic.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:23 +0000 (07:15 -0400)]
bnxt_en: Workaround Nitro A0 hardware RX bug (part 3).
Allocate napi for special vnic, packets arriving on this
napi will simply be dropped and the buffers will be replenished back
to the HW.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:22 +0000 (07:15 -0400)]
bnxt_en: Workaround Nitro A0 hardware RX bug (part 2).
The hardware is unable to drop rx packets not matching the RX filters. To
workaround it, we create a special VNIC and configure the hardware to
direct all packets not matching the filters to it. We then setup the
driver to drop packets received on this VNIC.
This patch creates the infrastructure for this VNIC, reserves a
completion ring, and rx rings. Only shared completion ring mode is
supported. The next 2 patches add a NAPI to handle packets from this
VNIC and the setup of the VNIC.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:21 +0000 (07:15 -0400)]
bnxt_en: Workaround Nitro A0 hardware RX bug (part 1).
Nitro A0 has a hardware bug in the rx path. The workaround is to create
a special COS context as a path for non-RSS (non-IP) packets. Without this
workaround, the chip may stall when receiving RSS and non-RSS packets.
Add infrastructure to allow 2 contexts (RSS and CoS) per VNIC. Allocate
and configure the CoS context for Nitro A0.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Sreedharan [Mon, 18 Jul 2016 11:15:20 +0000 (07:15 -0400)]
bnxt_en: Add basic support for Nitro in North Star 2.
Nitro is the embedded version of the ethernet controller in the North
Star 2 SoC. Add basic code to recognize the chip ID and disable
the features (ntuple, TPA, ring and port statistics) not supported on
Nitro A0.
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Jul 2016 23:05:57 +0000 (16:05 -0700)]
Merge branch 'marvell-phy'
Charles-Antoine Couret says:
====================
Marvell phy: fiber interface configuration
Another patchset to manage correctly the fiber link for some concerned Marvell's
phy like
88E1512.
This patchset fixed the commit log for the third and last commits and a comment
in the first commit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Charles-Antoine Couret [Tue, 19 Jul 2016 09:13:13 +0000 (11:13 +0200)]
Marvell phy: add functions to suspend and resume both interfaces: fiber and copper links.
These functions used standards registers in a different page
for both interfaces: copper and fiber.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles-Antoine Couret <charles-antoine.couret@nexvision.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Charles-Antoine Couret [Tue, 19 Jul 2016 09:13:12 +0000 (11:13 +0200)]
Marvell phy: add configuration of autonegociation for fiber link.
To be correctly initilized, the fiber interface needs
to be configured via autonegociation registers which use
some customs options or registers.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles-Antoine Couret <charles-antoine.couret@nexvision.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Charles-Antoine Couret [Tue, 19 Jul 2016 09:13:11 +0000 (11:13 +0200)]
Marvell phy: add field to get errors from fiber link.
Add support for the fiber receiver error counter in the
statistics. Rename the current counter which is for copper errors to
phy_receive_errors_copper, so it is easy to distinguish copper from
fiber.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles-Antoine Couret <charles-antoine.couret@nexvision.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Charles-Antoine Couret [Tue, 19 Jul 2016 09:13:10 +0000 (11:13 +0200)]
Marvell phy: check link status in case of fiber link.
For concerned phy, the fiber link is checked before the copper link.
According to datasheet, the link which is up is enabled.
If both links are down, copper link would be used.
To detect fiber link status, we used the real time status
because of troubles with the copper method.
Tested with Marvell
88E1512.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles-Antoine Couret <charles-antoine.couret@nexvision.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Jul 2016 23:01:34 +0000 (16:01 -0700)]
Merge branch 'renesas-dma-channel'
Sergei Shtylyov says:
====================
Fix DMA channel misreporting for the Renesas Ethernet drivers
Here's a set of 2 patches against DaveM's 'net.git' repo fixing up the DMA
channel reporting by 'ifconfig'...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 17 Jul 2016 13:09:19 +0000 (16:09 +0300)]
sh_eth: fix DMA channel misreporting
Currently 'ifconfig' for the Ethernet devices handled by this driver shows
"DMA chan: ff" while the driver doesn't use any DMA channels. Not assigning
a value to 'net_device::dma' causes 'ifconfig' to correctly not report a
DMA channel.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 17 Jul 2016 13:08:43 +0000 (16:08 +0300)]
ravb: fix DMA channel misreporting
Currently 'ifconfig' for the Ethernet devices handled by this driver shows
"DMA chan: ff" while the driver doesn't use any DMA channels. Not assigning
a value to 'net_device::dma' causes 'ifconfig' to correctly not report a
DMA channel.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Jaillet [Sun, 17 Jul 2016 07:03:24 +0000 (09:03 +0200)]
drivers: atm: nicstar: Use the correct function to free some resources
In 'get_scq', 'dma_alloc_coherent' has been used to allocate some
resources, so we need to free them using 'dma_free_coherent' instead
of 'kfree'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Jaillet [Sun, 17 Jul 2016 06:15:50 +0000 (08:15 +0200)]
net: ti: cpmac: Use the correct function to free some resources.
In 'cpmac_open', 'dma_alloc_coherent' has been used to allocate some
resources, so we need to free them using 'dma_free_coherent' instead
of 'kfree'.
Also, we don't need to free these resources if the allocation has failed.
So I have slighly modified the goto label in this case.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Tue, 19 Jul 2016 03:02:59 +0000 (11:02 +0800)]
macvtap: correctly free skb during socket destruction
We should use kfree_skb() instead of kfree() to free an skb.
Fixes: 362899b8725b ("macvtap: switch to use skb array")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 17 Jul 2016 21:30:46 +0000 (23:30 +0200)]
net: ethernet: marvell: pxa168_eth: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 17 Jul 2016 21:30:45 +0000 (23:30 +0200)]
net: ethernet: marvell: pxa168_eth: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 16 Jul 2016 23:10:15 +0000 (01:10 +0200)]
net: ethernet: adi: bfin_mac: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
There was a check on CAP_NET_ADMIN in bfin_mac_ethtool_setsettings,
but this check is already done in dev_ethtool, so no need to repeat
it before calling the generic function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 16 Jul 2016 23:10:14 +0000 (01:10 +0200)]
net: ethernet: adi: bfin_mac: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaktipriya Shridhar [Sat, 16 Jul 2016 08:23:28 +0000 (13:53 +0530)]
dwc_eth_qos: Remove deprecated create_singlethread_workqueue
alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since the workitem viz
lp->txtimeout_reinit is involved in reinitialization if a TX timeout
occurs, which is necessary to guarantee forward progress in packet
processing. As a network device can be used during memory reclaim, the
workqueue needs forward progress guarantee under memory pressure.
WQ_MEM_RECLAIM has been set to ensure this.
Since there is only a single work item, explicit concurrency limit is
unnecessary here.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 15 Jul 2016 23:15:55 +0000 (01:15 +0200)]
bpf: bpf_event_entry_gen's alloc needs to be in atomic context
Should have been obvious, only called from bpf() syscall via map_update_elem()
that calls bpf_fd_array_map_update_elem() under RCU read lock and thus this
must also be in GFP_ATOMIC, of course.
Fixes: 3b1efb196eee ("bpf, maps: flush own entries on perf map release")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 15 Jul 2016 19:40:02 +0000 (16:40 -0300)]
sctp: fix GSO for IPv6
commit
90017accff61 ("sctp: Add GSO support") didn't register SCTP GSO
offloading for IPv6 and yet didn't put any restrictions on generating
GSO packets while in IPv6, which causes all IPv6 GSO'ed packets to be
silently dropped.
The fix is to properly register the offload this time.
Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 15 Jul 2016 19:38:19 +0000 (16:38 -0300)]
sctp: recvmsg should be able to run even if sock is in closing state
Commit
d46e416c11c8 missed to update some other places which checked for
the socket being TCP-style AND Established state, as Closing state has
some overlapping with the previous understanding of Established.
Without this fix, one of the effects is that some already queued rx
messages may not be readable anymore depending on how the association
teared down, and sending may also not be possible if peer initiated the
shutdown.
Also merge two if() blocks into one condition on sctp_sendmsg().
Cc: Xin Long <lucien.xin@gmail.com>
Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 13:25:36 +0000 (15:25 +0200)]
net: usb: ax88172x: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 17 Jul 2016 04:33:12 +0000 (21:33 -0700)]
Merge branch 'hisilicon-mdio-femac'
Dongpo Li says:
====================
Add Hisilicon MDIO bus driver and FEMAC driver
This patch set adds a Hisilicon MDIO bus driver and
a Fast Ethernet MAC(FEMAC) driver.
We also abstract a general interface "of_phy_get_and_connect"
for PHY connect. User will have no bother with getting
"phy-mode" and "phy-handle" any more.
Changes in v1:
- Pass private data structure instead of struct mii_bus
in MDIO read and write operation.
- Return the error which devm_clk_get() gives when MDIO probe.
- Leave the clock unprepared and disabled on error when MDIO probe.
- Abstract a general interface "of_phy_get_and_connect" for PHY connect.
- Remove the "_reset" suffixes in "reset-names" property.
- Enable tx per-packet interrupt when tx fifo full.
- Remove pointless compatible and add SoC specific compatible.
- Declare only one clock in MAC dts documentation.
- Add standard unit suffixes for "phy-reset-delays".
- Use a smaller NAPI poll weight 16 for our Fast Ethernet MAC.
- Use phy_ethtool_{get|set}_link_ksettings for ethtool ops.
- Use phydev from struct net_device in MAC driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dongpo Li [Fri, 15 Jul 2016 08:26:35 +0000 (16:26 +0800)]
net: hisilicon: Add Fast Ethernet MAC driver
This patch adds the Hisilicon Fast Ethernet MAC(FEMAC) driver.
The FEMAC supports max speed 100Mbps and has been used in many
Hisilicon SoC.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dongpo Li [Fri, 15 Jul 2016 08:26:34 +0000 (16:26 +0800)]
of_mdio: Abstract a general interface for phy connect
Abstract a general interface "of_phy_get_and_connect"
for PHY connect. User will have no bother with getting
"phy-mode" and "phy-handle" any more.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dongpo Li [Fri, 15 Jul 2016 08:26:33 +0000 (16:26 +0800)]
net: Add MDIO bus driver for the Hisilicon FEMAC
This patch adds a separate driver for the MDIO interface of the
Hisilicon Fast Ethernet MAC.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Fri, 15 Jul 2016 08:12:15 +0000 (10:12 +0200)]
net: cpsw: make TI_CPSW_PHY_SEL invisible
TI_CPSW_PHY_SEL depended on TI_CPSW and was selected by the latter. So
there is no reason to have this symbol visible.
A further optimisation would be to put the code for both symbols into a
single module which would allow to not export at least cpsw_phy_sel()
and simplify the module load process.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Fri, 15 Jul 2016 02:38:25 +0000 (10:38 +0800)]
wan/fsl_ucc_hdlc: rewrite error handling to make it clearer
It was used err_xxx for labeled statement, it is
not easy to understand, now use free_xxx for labeled
statement.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhao Qiang [Fri, 15 Jul 2016 02:38:24 +0000 (10:38 +0800)]
wan/fsl_ucc_hdlc: remove reduplicative freed memory 'uhdlc_priv'
'uhdlc_priv' has freed twice, drop the first one.
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 14 Jul 2016 16:28:27 +0000 (19:28 +0300)]
net: ipmr/ip6mr: add support for keeping an entry age
In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
CC: Satish Ashok <sashok@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kristian Evensen [Thu, 14 Jul 2016 08:23:03 +0000 (10:23 +0200)]
rndis_host: Set valid random MAC on buggy devices
Some devices of the same type all export the same, random MAC address. This
behavior has been seen on the ZTE MF910, MF823 and MF831, and there are
probably more devices out there. Fix this by generating a valid random MAC
address if we read a random MAC from device.
Also, changed the memcpy() to ether_addr_copy(), as pointed out by
checkpatch.
Suggested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 17 Jul 2016 02:57:38 +0000 (19:57 -0700)]
Merge branch 'bridge-rx-simplify-fwd-consolidate'
Nikolay Aleksandrov says:
====================
net: bridge: simplify receive path and consolidate forwarding paths
This set tries to simplify the receive and forwarding paths. Patch 01 is
a trivial style adjustment, patch 02 removes one conditional from the
unicast fast path, patch 03 removes another conditional and more imporantly
removes the skb0/skb2 ambiguity about locally receiving the skb and
switches to a boolean called "local_rcv".
Patch 04 is the most important change which consolidates the forwarding
paths for locally originated and forwarded packets into __br_forward. This
allows us to remove the function pointers giving a minor performance boost,
more importantly it makes it much easier to reason about the forwarding
path and reduces the code duplication that was needed when making changes.
Also it allows the receive path to fully setup the environment prior to
calling any forwarding functions (i.e. to properly set unicast, local_rcv
and search for unicast/mcast dst).
Functionally everything should stay the same after this set.
I've done basic tests with unicast/multicast/broadcast Tx/Rx. Please
review carefully.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:02 +0000 (06:10 +0300)]
net: bridge: remove _deliver functions and consolidate forward code
Before this patch we had two flavors of most forwarding functions -
_forward and _deliver, the difference being that the latter are used
when the packets are locally originated. Instead of all this function
pointer passing and code duplication, we can just pass a boolean noting
that the packet was locally originated and use that to perform the
necessary checks in __br_forward. This gives a minor performance
improvement but more importantly consolidates the forwarding paths.
Also add a kernel doc comment to explain the exported br_forward()'s
arguments.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:01 +0000 (06:10 +0300)]
net: bridge: drop skb2/skb0 variables and use a local_rcv boolean
Currently if the packet is going to be received locally we set skb0 or
sometimes called skb2 variables to the original skb. This can get
confusing and also we can avoid one conditional on the fast path by
simply using a boolean and passing it around. Thanks to Roopa for the
name suggestion.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:10:00 +0000 (06:10 +0300)]
net: bridge: rearrange flood vs unicast receive paths
This patch removes one conditional from the unicast path by using the fact
that skb is NULL only when the packet is multicast or is local.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Thu, 14 Jul 2016 03:09:59 +0000 (06:09 +0300)]
net: bridge: minor style adjustments in br_handle_frame_finish
Trivial style changes in br_handle_frame_finish.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Sailer [Sat, 16 Jul 2016 02:04:34 +0000 (04:04 +0200)]
tcp_timer.c: Add kernel-doc function descriptions
This adds kernel-doc style descriptions for 6 functions and
fixes 1 typo.
Signed-off-by: Richard Sailer <richard@weltraumpflege.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 10:39:02 +0000 (12:39 +0200)]
net: ethernet: ti: cpmac: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
There was a check on CAP_NET_ADMIN in cpmac_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 10:39:01 +0000 (12:39 +0200)]
net: ethernet: ti: cpmac: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 10:05:12 +0000 (12:05 +0200)]
net: ethernet: amd: au1000_eth: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
There was a check on CAP_NET_ADMIN in au1000_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 10:05:11 +0000 (12:05 +0200)]
net: ethernet: amd: au1000_eth: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 08:36:21 +0000 (10:36 +0200)]
net: ethernet: smsc9420: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 08:36:20 +0000 (10:36 +0200)]
net: ethernet: smsc9420: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 07:59:12 +0000 (09:59 +0200)]
net: ethernet: ethoc: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 15 Jul 2016 07:59:11 +0000 (09:59 +0200)]
net: ethernet: ethoc: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 21:44:53 +0000 (23:44 +0200)]
net: ethernet: pasemi_mac: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 21:44:52 +0000 (23:44 +0200)]
net: ethernet: pasemi_mac: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 17:45:58 +0000 (19:45 +0200)]
net: ethernet: xilinx: axienet: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 17:45:57 +0000 (19:45 +0200)]
net: ethernet: xilinx: axienet: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 13:20:47 +0000 (15:20 +0200)]
net: ethernet: tc35815: use phy_ethtool_{get|set}_link_ksettings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Thu, 14 Jul 2016 13:20:46 +0000 (15:20 +0200)]
net: ethernet: tc35815: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 15 Jul 2016 21:55:20 +0000 (23:55 +0200)]
net: fixup for tracepoint napi:napi_poll
The recent change to tracepoint napi:napi_poll changed the order of
the parameters that perf scripts sees, the printk was correct. The
problem was that the new parameters (work and budget) were pushed
in front of dev_name.
The new parameters obviously need to be appended to keep backward
compatible.
Fixes: 1db19db7f5ff ("net: tracepoint napi:napi_poll add work and budget")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 15 Jul 2016 07:46:31 +0000 (03:46 -0400)]
macvtap: switch to use skb array
This patch switch to use skb array instead of sk_receive_queue to
avoid spinlock contentions. Tests shows about 21% improvements for
guest rx pps:
Before:
1472731 pkts/s
After:
1786289 pkts/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 15 Jul 2016 07:46:30 +0000 (03:46 -0400)]
macvtap: avoid hash calculating for single queue
We decide the rxq through calculating its hash which is not necessary
if we only have one rx queue. So this patch skip this and just return
queue 0. Test shows 22% improving on guest rx pps.
Before:
1201504 pkts/s
After:
1472731 pkts/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Jul 2016 21:23:56 +0000 (14:23 -0700)]
Merge branch 'bpf-event-output-helper-improvements'
Daniel Borkmann says:
====================
BPF event output helper improvements
This set adds improvements to the BPF event output helper to
support non-linear data sampling, here specifically, for skb
context. For details please see individual patches. The set
is based against net-next tree.
v1 -> v2:
- Integrated and adapted Peter's diff into patch 1, updated
the remaining ones accordingly. Thanks Peter!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 14 Jul 2016 16:08:05 +0000 (18:08 +0200)]
bpf: avoid stack copy and use skb ctx for event output
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.
We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 14 Jul 2016 16:08:04 +0000 (18:08 +0200)]
bpf, perf: split bpf_perf_event_output
Split the bpf_perf_event_output() helper as a preparation into
two parts. The new bpf_perf_event_output() will prepare the raw
record itself and test for unknown flags from BPF trace context,
where the __bpf_perf_event_output() does the core work. The
latter will be reused later on from bpf_event_output() directly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 14 Jul 2016 16:08:03 +0000 (18:08 +0200)]
perf, events: add non-linear data support for raw records
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.
If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.
This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.
The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].
[1] http://thread.gmane.org/gmane.linux.network/421294
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 14 Jul 2016 14:47:01 +0000 (15:47 +0100)]
rxrpc: checking for IS_ERR() instead of NULL
The rxrpc_lookup_peer() function returns NULL on error, it never returns
error pointers.
Fixes: 8496af50eb38 ('rxrpc: Use RCU to access a peer's service connection tree')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philipp Zabel [Thu, 14 Jul 2016 14:29:43 +0000 (16:29 +0200)]
net: phy: micrel: Add KSZ8041FTL fiber mode support
We can't detect the FXEN (fiber mode) bootstrap pin, so configure
it via a boolean device tree property "micrel,fiber-mode".
If it is enabled, auto-negotiation is not supported.
The only available modes are 100base-fx (full duplex and half duplex).
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 14 Jul 2016 11:16:53 +0000 (14:16 +0300)]
wan/fsl_ucc_hdlc: info leak in uhdlc_ioctl()
There is a 2 byte struct whole after line.loopback so we need to clear
that out to avoid disclosing stack information.
Fixes: c19b6d246a35 ('drivers/net: support hdlc function for QE-UCC')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Jul 2016 18:36:58 +0000 (11:36 -0700)]
Merge branch 'rds-enable-mprds'
Sowmini Varadhan says:
====================
RDS: TCP: Enable mprds for rds-tcp
The third, and final, installment for mprds-tcp changes.
In Patch 3 of this set, if the transport support t_mp_capable,
we hash outgoing traffic across multiple paths. Additionally, even if
the transport is MP capable, we may be peering with some node that does
not support mprds, or supports a different number of paths. This
necessitates RDS control plane changes so that both peers agree
on the number of paths to be used for the rds-tcp connection.
Patch 3 implements all these changes, which are documented in patch 5
of the series.
Patch 1 of this series is a bug fix for a race-condition
that has always existed, but is now more easily encountered with mprds.
Patch 2 is code refactoring. Patches 4 and 5 are Documentation updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:05 +0000 (03:51 -0700)]
Documentation: RDS: Document Multipath RDS (mprds)
Document the design of mprds, covering a brief description
of the motivation, data-structures and modifications to the
RDS control plane.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 14 Jul 2016 10:51:04 +0000 (03:51 -0700)]
Documentation: RDS: updates for SO_RDS_TRANSPORT socket option
Update the documentation to describe the changes added by
commit
8ba38460f363 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>