Eran Ben Elisha [Wed, 28 Mar 2018 10:26:50 +0000 (13:26 +0300)]
net/mlx5e: Report all channels with min RX WQEs timeout
Report all channels which got timeout on posting the minimal number of
RX WQEs and not only the first one. Avoid busy wait on every channel,
when one of the RQs check got timeout, poll once for the remaining RQs.
In addition, add channel index to log when failed to get min RX WQEs
This info is needed in order to debug in case of dysfunctional channel.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 12:50:36 +0000 (15:50 +0300)]
net/mlx5e: Support offloaded TC flows with no matches on headers
For example:
tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop
Note that for eswitch flows, we still always match on the source port.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 12:42:06 +0000 (15:42 +0300)]
net/mlx5e: Get the required HW match level while parsing TC flow matches
Introduce levels of matching on headers of offloaded flows
(none, L2, L3, L4) that follow the inline mode levels.
This is pre-step for us to offload flows without any
matches on headers.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 12:33:52 +0000 (15:33 +0300)]
net/mlx5e: Properly order min inline mode setup while parsing TC matches
Set the initial value to none instead of L2, and set to L2
where the previous initial value was assumed. Make sure to
parse L2 matches before L3 matches and L3 before L4.
This is a pre-step to get the match level for more purposes
other than the validating the needed vs. actual inline level.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 11:58:25 +0000 (14:58 +0300)]
net/mlx5e: Use local actions var while processing offloaded TC flow actions
Use local actions variable while parsing the actions of offloaded TC flow.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 11:55:39 +0000 (14:55 +0300)]
net/mlx5e: Return success when TC offloaded fdb actions parsed ok
Reaching here, means we didn't err anywhere, so lets just
return success.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Thu, 5 Apr 2018 11:46:02 +0000 (14:46 +0300)]
net/mlx5e: Avoid redundant zeroing of offloaded TC flow attributes
This is not needed as the attributes are zeroed out on allocation.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 8 Apr 2018 06:59:22 +0000 (09:59 +0300)]
net/mlx5e: Clean static checker complaints on TC offload and VF reps code
Clean warning/check complaints made by checkpatch on en_{tc,rep}.c
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Or Gerlitz [Sun, 8 Apr 2018 06:45:32 +0000 (09:45 +0300)]
net/mlx5e: Remove double defined DMAC header re-write element
The firmware DMAC_47_16 header re-write token was defined twice,
clean it up.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 23 Nov 2017 12:39:22 +0000 (14:39 +0200)]
net/mlx5e: Use bool as return type for mlx5e_xdp_handle
Function returns boolean values, use bool instead of int.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 23 Nov 2017 12:29:25 +0000 (14:29 +0200)]
net/mlx5e: Use u8 instead of int for LRO number of segments
Range of LRO number of segments fits in u8.
Also, bring initialization and declaration together to
save code.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Mon, 13 Nov 2017 16:07:50 +0000 (18:07 +0200)]
net/mlx5e: Skip redundant checks when providing NUD lastuse feedback
It's redundant to continue the loop if we found one flow whose lastuse value
being newer than the last one we reported, since this is enough for us to
trigger a NUD update (neigh_event_send()).
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 7 Sep 2017 19:30:58 +0000 (22:30 +0300)]
net/mlx5e: Remove redundant vport context vlan update
In delete vlan flow an extra call to mlx5e_vport_context_update_vlans
was added by mistake, remove it.
Fixes: 86d722ad2c3b ("net/mlx5: Use flow steering infrastructure for mlx5_en")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Arjun Vynipadath [Mon, 14 May 2018 07:54:43 +0000 (13:24 +0530)]
cxgb4: do not fail vf instatiation in slave mode
We no longer require a check for cxgb4 to be MASTER
when configuring SRIOV, It was required when we had
module parameter to instantiate vf.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Mon, 14 May 2018 06:40:44 +0000 (09:40 +0300)]
mlxsw: spectrum_span: Support LAG under mirror-to-gretap
When resolving a path that the packet will take after being encapsulated
in mirror-to-gretap scenarios, one of the devices en route could be a
LAG. In that case, mirror to first up slave that corresponds to a front
panel port.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hernán Gonzalez [Sun, 13 May 2018 23:33:49 +0000 (20:33 -0300)]
net: ethernet: ti: Use ERR_CAST instead of ERR_PTR(PTR_ERR())
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).
drivers/net/ethernet/ti/cpts.c:567:9-16: WARNING: ERR_CAST can be used with cpts->refclk
Generated by: scripts/coccinelle/api/err_cast.cocci
Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Sun, 13 May 2018 20:44:27 +0000 (17:44 -0300)]
sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.
But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.
The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.
This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.
The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower verbose \
src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower \
src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 May 2018 19:06:55 +0000 (15:06 -0400)]
Merge branch 'stmmac-dwmac-sun8i-Support-R40'
Chen-Yu Tsai says:
====================
net: stmmac: dwmac-sun8i: Support R40
This is a resend of the patches for net-next split out from my R40
Ethernet support v2 series, as requested by David Miller. The arm-soc
bits will follow, once I rework the A64 system controller compatible.
Patches 1, 2, and 3 clean up the dwmac-sun8i binding.
Patch 4 adds device tree binding for Allwinner R40's Ethernet
controller.
Patch 5 converts regmap access of the syscon region in the dwmac-sun8i
driver to regmap_field, in anticipation of different field widths on
the R40.
Patch 6 introduces custom plumbing in the dwmac-sun8i driver to fetch
a regmap from another device, by looking up said device via a phandle,
then getting the regmap associated with that device.
Patch 7 adds support for different or absent TX/RX delay chain ranges
to the dwmac-sun8i driver.
Patch 8 adds support for the R40's ethernet controller.
Excerpt from original cover letter:
Changes since v1:
- Default to fetching regmap from device pointed to by syscon phandle,
and falling back to syscon API if that fails.
- Dropped .syscon_from_dev field in device data as a result of the
previous change.
- Added a large comment block explaining the first change.
- Simplified description of syscon property in sun8i-dwmac binding.
- Regmap now only exposes the EMAC/GMAC register, but retains the
offset within its address space.
- Added patches for A64, which reuse the same sun8i-dwmac changes.
This series adds support for the DWMAC based Ethernet controller found
on the Allwinner R40 SoC. The controller is either a DWMAC clone or
DWMAC core with its registers rearranged. This is already supported by
the dwmac-sun8i driver. The glue layer control registers, unlike other
sun8i family SoCs, is not in the system controller region, but in the
clock control unit, like with the older A20 and A31 SoCs.
While we reuse the bindings for dwmac-sun8i using a syscon phandle
reference, we need some custom plumbing for the clock driver to export
a regmap that only allows access to the GMAC register to the dwmac-sun8i
driver. An alternative would be to allow drivers to register custom
syscon devices with their own regmap and locking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:25 +0000 (03:14 +0800)]
net: stmmac: dwmac-sun8i: Add support for GMAC on Allwinner R40 SoC
The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i.
It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet
controller supported by sun4i-emac. The controller is the same, but
the R40 has the glue layer controls in the clock control unit (CCU),
with a reduced RX delay chain, and no TX delay chain.
This patch adds support for it using the framework laid out by previous
patches to map the differences.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:24 +0000 (03:14 +0800)]
net: stmmac: dwmac-sun8i: Support different ranges for TX/RX delay chains
On the R40 SoC, the RX delay chain only has a range of 0~7 (hundred
picoseconds), instead of 0~31. Also the TX delay chain is completely
absent.
This patch adds support for different ranges by adding per-compatible
maximum values in the variant data. A maximum of 0 indicates that the
delay chain is not supported or absent.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:23 +0000 (03:14 +0800)]
net: stmmac: dwmac-sun8i: Allow getting syscon regmap from external device
On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
address space. Using a standard syscon to access it provides no
coordination with the CCU driver for register access. Neither does
it prevent this and other drivers from accessing other, maybe critical,
clock control registers. On other SoCs, the register is in the "system
control" address space, which might also contain controls for mapping
SRAM to devices or the CPU. This hardware has the same issues.
Instead, for these types of setups, we let the device containing the
control register create a regmap tied to it. We can then get the device
from the existing syscon phandle, and retrieve the regmap with
dev_get_regmap().
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:22 +0000 (03:14 +0800)]
net: stmmac: dwmac-sun8i: Use regmap_field for syscon register access
On the Allwinner R40, the "GMAC clock" register is located in the CCU
block, at a different register address than the other SoCs that have
it in the "system control" block.
This patch converts the use of regmap to regmap_field for mapping and
accessing the syscon register, so we can have the register address in
the variants data, and not in the actual register manipulation code.
This patch only converts regmap_read() and regmap_write() calls to
regmap_field_read() and regmap_field_write() calls. There are some
places where it might make sense to switch to regmap_field_update_bits(),
but this is not done here to keep the patch simple.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:21 +0000 (03:14 +0800)]
dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40 SoC
The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i.
It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet
controller supported by sun4i-emac. The controller is the same, but
the R40 has the glue layer controls in the clock control unit (CCU),
with a reduced RX delay chain, and no TX delay chain.
This patch adds the R40 specific bits to the dwmac-sun8i binding.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:20 +0000 (03:14 +0800)]
dt-bindings: net: dwmac-sun8i: simplify description of syscon property
The syscon property is used to point to the device that holds the glue
layer control register known as the "EMAC (or GMAC) clock register".
We do not need to explicitly list what compatible strings are needed, as
this information is readily available in the user manuals. Also the
"syscon" device type is more of an implementation detail. There are many
ways to access a register not in a device's address range, the syscon
interface being the most generic and unrestricted one.
Simplify the description so that it says what it is supposed to
describe.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:19 +0000 (03:14 +0800)]
dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical order
The A83T syscon compatible was appended to the syscon compatibles list,
instead of inserted in to preserve the ordering.
Move it to the proper place to keep the list sorted.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Sun, 13 May 2018 19:14:18 +0000 (03:14 +0800)]
dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions
The clock delay chains found in the glue layer for dwmac-sun8i are only
used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as
MII external PHYs or the internal PHY. Also, a recent SoC has a smaller
range of possible values for the delay chain.
This patch reformats the delay chain section of the device tree binding
to make it clear that the delay chains only apply to RGMII PHYs, and
make it easier to add the R40-specific bits later.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 May 2018 18:49:40 +0000 (14:49 -0400)]
Merge branch 'dsa-mv88e6xxx-remove-Global-1-setup'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: remove Global 1 setup
The mv88e6xxx driver is still writing arbitrary registers at setup time,
e.g. priority override bits. Add ops for them and provide specific setup
functions for priority and stats before getting rid of the erroneous
mv88e6xxx_g1_setup code, as previously done with Global 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 11 May 2018 21:16:36 +0000 (17:16 -0400)]
net: dsa: mv88e6xxx: add a stats setup function
Now that the Global 1 specific setup function only setup the statistics
unit, kill it in favor of a mv88e6xxx_stats_setup function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 11 May 2018 21:16:35 +0000 (17:16 -0400)]
net: dsa: mv88e6xxx: add IEEE and IP mapping ops
All Marvell switch families except
88E6390 have direct registers in
Global 1 for IEEE and IP priorities override mapping. The
88E6390 uses
indirect tables instead.
Add .ieee_pri_map and .ip_pri_map ops to distinct that and call them
from a mv88e6xxx_pri_setup helper. Only non-6390 are concerned ATM.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 11 May 2018 21:16:34 +0000 (17:16 -0400)]
net: dsa: mv88e6xxx: use helper for 6390 histogram
The Marvell
88E6390 model has its histogram mode bits moved in the
Global 1 Control 2 register. Use the previously introduced
mv88e6xxx_g1_ctl2_mask helper to set them.
At the same time complete the documentation of the said register.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 May 2018 18:28:59 +0000 (14:28 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-05-14
This series contains updates to virtchnl, i40e and i40evf.
Bruce cleans up whitespace and unnecessary parentheses in virtchnl.
Jake does a number of stat cleanups in the i40e driver, including
cleanup of code indentation, whitespace issues, remove duplicate stats,
fix grammar in code comment and general spring cleaning of the
statistics code.
Patryk fixes an issue where we recalculate vectors left and vectors
wanted but do not take into account the reduced number of queue pairs
per VSI.
Harshitha adds tx_busy stat to ethtool stats to track the number of
times we return NETDEV_TX_BUSY to the stack during transmit.
Paweł fixes a potential system crash when unloading the VF driver after
a hardware reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 May 2018 17:46:05 +0000 (13:46 -0400)]
Merge branch 'kernel-add-support-to-collect-hardware-logs-in-crash-recovery-kernel'
Rahul Lakkireddy says:
====================
kernel: add support to collect hardware logs in crash recovery kernel
On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.
This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are added as elf notes to
/proc/vmcore, which is copied by user space scripts for post-analysis.
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:
1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.
2. vmcore module allocates the buffer with requested size. It adds
an elf note and invokes the device driver's registered callback
function.
3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.
The device specific hardware/firmware logs can be seen as elf notes
with note type 0x700, as shown below:
Displaying notes found at file offset 0x00001000 with length 0x040032c0:
Owner Data size Description
LINUX 0x02000fec Unknown note type: (0x00000700)
LINUX 0x02000fec Unknown note type: (0x00000700)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
VMCOREINFO 0x00000785 Unknown note type: (0x00000000)
Patch 1 adds API to vmcore module to allow drivers to register callback
to collect the device specific hardware/firmware logs. The logs will
be added to /proc/vmcore as elf notes.
Patch 2 updates read and mmap logic to append device specific hardware/
firmware logs as elf notes.
Patch 3 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.
Thanks,
Rahul
---
v8:
- Added missing linux/types.h header include.
- Removed __vmcore_add_device_dump().
v7:
- Removed "CHELSIO" vendor identifier in Elf Note name. Instead,
writing "LINUX".
- Moved vmcoredd_header to new file include/uapi/linux/vmcore.h
- Reworked vmcoredd_header to include Elf Note as part of the header
itself.
- Removed vmcoredd_get_note_size().
- Renamed vmcoredd_write_note() to vmcoredd_write_header().
- Replaced all "unsigned long" with "unsigned int" for device dump
size since max size of Elf Word is u32.
v6:
- Reworked device dump elf note name to contain vendor identifier.
- Added vmcoredd_header that precedes actual dump in the Elf Note.
- Device dump's name is moved inside vmcoredd_header.
- Added "CHELSIO" string as vendor identifier in the Elf Note name
for cxgb4 device dumps.
v5:
- Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and
updated help message.
v4:
- Made __vmcore_add_device_dump() static.
- Moved compile check to define vmcore_add_device_dump() to
crash_dump.h to fix compilation when vmcore.c is not compiled in.
- Convert ---help--- to help in Kconfig as indicated by checkpatch.
- Rebased to tip.
v3:
- Dropped sysfs crashdd module.
- Exported dumps as elf notes. Suggested by Eric Biederman
<ebiederm@xmission.com>. Added as patch 2 in this version.
- Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
dump support.
- Moved logic related to adding dumps from crashdd to vmcore module.
- Rename all crashdd* to vmcoredd*.
- Updated comments.
v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.
Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
Stephen Hemminger <stephen@networkplumber.org>
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
rfc v2:
- Collecting logs in 2nd kernel instead of during kernel panic.
Suggested by Eric Biederman <ebiederm@xmission.com>.
- Added new crashdd module that exports /proc/crashdd/ containing
driver's registered hardware/firmware logs in patch 1.
- Replaced the API to allow drivers to register their hardware/firmware
log collect routine in crash recovery kernel in patch 1.
- Updated patch 2 to use the new API in patch 1.
====================
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 2 May 2018 09:47:19 +0000 (15:17 +0530)]
cxgb4: collect hardware dump in second kernel
Register callback to collect hardware/firmware dumps in second kernel
before hardware/firmware is initialized. The dumps for each device
will be available as elf notes in /proc/vmcore in second kernel.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 2 May 2018 09:47:18 +0000 (15:17 +0530)]
vmcore: append device dumps to vmcore as elf notes
Update read and mmap logic to append device dumps as additional notes
before the other elf notes. We add device dumps before other elf notes
because the other elf notes may not fill the elf notes buffer
completely and we will end up with zero-filled data between the elf
notes and the device dumps. Tools will then try to decode this
zero-filled data as valid notes and we don't want that. Hence, adding
device dumps before the other elf notes ensure that zero-filled data
can be avoided. This also ensures that the device dumps and the
other elf notes can be properly mmaped at page aligned address.
Incorporate device dump size into the total vmcore size. Also update
offsets for other program headers after the device dumps are added.
Suggested-by: Eric Biederman <ebiederm@xmission.com>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 2 May 2018 09:47:17 +0000 (15:17 +0530)]
vmcore: add API to collect hardware dump in second kernel
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:
1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.
2. vmcore module allocates the buffer with requested size. It adds
an Elf note and invokes the device driver's registered callback
function.
3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.
Ensure that the device dump buffer size is always aligned to page size
so that it can be mmaped.
Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
generic and reserve NT_VMCOREDD note type to indicate vmcore device
dump.
Suggested-by: Eric Biederman <ebiederm@xmission.com>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paweł Jabłoński [Thu, 10 May 2018 12:59:49 +0000 (05:59 -0700)]
i40evf: Fix a hardware reset support in VF driver
This patch fixes a hardware reset support in VF driver.
It is needed because when a hardware reset is detected
adapter->state is in __I40EVF_RESETTING state before
i40evf_reset_task is called. Without this patch
unloading VF driver after a hardware reset ends
with a system crash.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:48 +0000 (05:59 -0700)]
i40e: free the skb after clearing the bitlock
In commit
bbc4e7d273b5 ("i40e: fix race condition with PTP_TX_IN_PROGRESS
bits") we modified the code which handles Tx timestamps so that we would
clear the progress bit as soon as possible.
A later commit
0bc0706b46cd ("i40e: check for Tx timestamp timeouts during
watchdog") introduced similar code for detecting and handling cleanup of
a blocked Tx timestamp. This code did not use the same pattern for cleaning
up the skb.
Update this code to wait to free the skb until after the bit lock is
free, by first setting the ptp_tx_skb to NULL and clearing the lock.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:47 +0000 (05:59 -0700)]
i40e: cleanup wording in a header comment
Fix up the English in the header comment for i40e_ptp_tx_hang.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:46 +0000 (05:59 -0700)]
i40evf: remove MAX_QUEUES and just use I40EVF_MAX_REQ_QUEUES
We don't really need to have separate definitions for MAX_QUEUES and
I40EVF_MAX_REQ_QUEUES, since we'll always be limited by how many queues
we request anyways. If we haven't enabled requesting the maximum number
of queues, there's no reason to have our call to alloc_etherdev_mq
actually pass the higher value, since we'd never enable those queues
anyways.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Harshitha Ramamurthy [Thu, 10 May 2018 12:59:45 +0000 (05:59 -0700)]
i40e: add tx_busy to ethtool stats
This patch adds the tx_busy stat to the ethtool stats. The tx_busy
stat tracks the number of times we return NETDEV_TX_BUSY to the stack
during transmit.
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Patryk Małek [Thu, 10 May 2018 12:59:44 +0000 (05:59 -0700)]
i40e: Fix recalculation of MSI-X vectors for VMDq
This patch adds a recalculation of number of MSI-X
vectors for VMDq in the case where we have less
vectors available than we would want to reserve for
VMDq.
It fixes the issue where we recalculate vectors left
and vectors wanted but we didn't take into account
the reduced number of queue pairs per VSI.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:43 +0000 (05:59 -0700)]
i40e: cleanup whitespace for some ethtool stat definitions
A future patch is going to refactor some of the ethtool statistic code.
To keep the patches easy to review, cleanup some of the indentation used
for macro definitions first.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:42 +0000 (05:59 -0700)]
i40e: remove duplicate pfc stats
The pfc related priority stats are already handled separately as these
stats are actually arrays of length I40E_MAX_USER_PRIORITY. Thus,
including them within i40e_gstrings_stats will just duplicate data.
Worse, the sizeof will be incorrect, as it will be the total size of the
stat arrays, which in this case is 8 * sizeof(u64), so we will only copy
the stat contents as if they were a u32.
Since we already correctly handle these stats else where, remove them
from the i40e_gstrings_stats.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 10 May 2018 12:59:41 +0000 (05:59 -0700)]
i40e: calculate ethtool stats size in a separate function
Use a separate function to calculate the number of stats for
a particular device. This helps reduce the clutter in
i40e_get_sset_count().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Thu, 10 May 2018 12:59:40 +0000 (05:59 -0700)]
i40evf: Fix client header define
Fix up the VF client header define, since it is the same as the PF
client header.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Bruce Allan [Thu, 10 May 2018 12:59:39 +0000 (05:59 -0700)]
virtchnl: Whitespace and parenthesis cleanup
Clean up existing instances of unnecessary parentheses in if
statement and change order of conditionals to make it easier to read
The opening /* should be followed by a single space and the closing */
should be preceded with a single space.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anders Roxell [Sun, 13 May 2018 19:48:30 +0000 (21:48 +0200)]
net: ipv4: ipconfig: fix unused variable
When CONFIG_PROC_FS isn't set, variable ipconfig_dir isn't used.
net/ipv4/ipconfig.c:167:31: warning: ‘ipconfig_dir’ defined but not used [-Wunused-variable]
static struct proc_dir_entry *ipconfig_dir;
^~~~~~~~~~~~
Move the declaration of ipconfig_dir inside the CONFIG_PROC_FS ifdef to
fix the warning.
Fixes: c04d2cb2009f ("ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 12 May 2018 00:53:22 +0000 (20:53 -0400)]
Merge git://git./linux/kernel/git/davem/net
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 21:14:46 +0000 (14:14 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
Easton.
2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.
3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
Silva.
4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
grateful for this fix.
5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
do appreciate.
6) Use after free in TLS code. Once again we are blessed by the
honorable Eric Dumazet with this fix.
7) Fix regression in TLS code causing stalls on partial TLS records.
This fix is bestowed upon us by Andrew Tomt.
8) Deal with too small MTUs properly in LLC code, another great gift
from Eric Dumazet.
9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
Hangbin Liu we give thanks for this.
10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
Paolo Abeni, he gave us this.
11) Range check coalescing parameters in mlx4 driver, thank you Moshe
Shemesh.
12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
David Howells.
13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
you're the best!
14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
Benerjee saved us!
15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
is now water tight, thanks to Andrey Ignatov.
16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
everywhere!
17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
without you!
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net sched actions: fix refcnt leak in skbmod
net: sched: fix error path in tcf_proto_create() when modules are not configured
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
ixgbe: fix memory leak on ipsec allocation
ixgbevf: fix ixgbevf_xmit_frame()'s return type
ixgbe: return error on unsupported SFP module when resetting
ice: Set rq_last_status when cleaning rq
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
bonding: send learning packets for vlans on slave
bonding: do not allow rlb updates to invalid mac
net/mlx5e: Err if asked to offload TC match on frag being first
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
net/mlx5: Free IRQs in shutdown path
rxrpc: Trace UDP transmission failure
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
rxrpc: Fix the min security level for kernel calls
rxrpc: Fix error reception on AF_INET6 sockets
rxrpc: Fix missing start of call timeout
qed: fix spelling mistake: "taskelt" -> "tasklet"
...
Linus Torvalds [Fri, 11 May 2018 20:56:43 +0000 (13:56 -0700)]
Merge tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These patches fix both a possible corruption during NFSoRDMA MR
recovery, and a sunrpc tracepoint crash.
Additionally, Trond has a new email address to put in the MAINTAINERS
file"
* tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
Change Trond's email address in MAINTAINERS
sunrpc: Fix latency trace point crashes
xprtrdma: Fix list corruption / DMAR errors during MR recovery
Roman Mashak [Fri, 11 May 2018 18:35:33 +0000 (14:35 -0400)]
net sched actions: fix refcnt leak in skbmod
When application fails to pass flags in netlink TLV when replacing
existing skbmod action, the kernel will leak refcnt:
$ tc actions get action skbmod index 1
total acts 0
action order 0: skbmod pipe set smac 00:11:22:33:44:55
index 1 ref 1 bind 0
For example, at this point a buggy application replaces the action with
index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags,
however refcnt gets bumped:
$ tc actions get actions skbmod index 1
total acts 0
action order 0: skbmod pipe set smac 00:11:22:33:44:55
index 1 ref 2 bind 0
$
Tha patch fixes this by calling tcf_idr_release() on existing actions.
Fixes: 86da71b57383d ("net_sched: Introduce skbmod action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:36:06 +0000 (13:36 -0700)]
Merge tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"These patches fix two long-standing bugs in the DIO code path, one of
which is a crash trivially triggerable with splice()"
* tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client:
ceph: fix iov_iter issues in ceph_direct_read_write()
libceph: add osd_req_op_extent_osd_data_bvecs()
ceph: fix rsize/wsize capping in ceph_direct_read_write()
Dan Murphy [Fri, 11 May 2018 18:08:19 +0000 (13:08 -0500)]
net: phy: DP83TC811: Introduce support for the DP83TC811 phy
Add support for the DP83811 phy.
The DP83811 supports both rgmii and sgmii interfaces.
There are 2 part numbers for this the DP83TC811R does not
reliably support the SGMII interface but the DP83TC811S will.
There is not a way to differentiate these parts from the
hardware or register set. So this is controlled via the DT
to indicate which phy mode is required. Or the part can be
strapped to a certain interface.
Data sheet can be found here:
http://www.ti.com/product/DP83TC811S-Q1/description
http://www.ti.com/product/DP83TC811R-Q1/description
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 11 May 2018 15:45:32 +0000 (17:45 +0200)]
net: sched: fix error path in tcf_proto_create() when modules are not configured
In case modules are not configured, error out when tp->ops is null
and prevent later null pointer dereference.
Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:14:24 +0000 (13:14 -0700)]
Merge tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh
Pull arch/sh fixes from Rich Felker:
"Fixes for critical regressions and a build failure.
The regressions were introduced in 4.15 and 4.17-rc1 and prevented
booting on affected systems"
* tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh:
sh: switch to NO_BOOTMEM
sh: mm: Fix unprotected access to struct device
sh: fix build failure for J2 cpu with SMP disabled
Ganesh Goudar [Fri, 11 May 2018 13:06:16 +0000 (18:36 +0530)]
cxgb4: avoid schedule while atomic
do not sleep while adding or deleting udp tunnel.
Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 11 May 2018 13:05:33 +0000 (18:35 +0530)]
cxgb4: enable inner header checksum calculation
set cntrl bits to indicate whether inner header checksum
needs to be calculated whenever the packet is an encapsulated
packet and enable supported encap features.
Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Fri, 11 May 2018 13:04:43 +0000 (18:34 +0530)]
cxgb4: Fix {vxlan/geneve}_port initialization
adapter->rawf_cnt was not initialized, thereby
ndo_udp_tunnel_{add/del} was returning immediately
without initializing {vxlan/geneve}_port.
Also initializes mps_encap_entry refcnt.
Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks")
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 11 May 2018 13:07:34 +0000 (18:37 +0530)]
cxgb4: Add new T5 device id
Add 0x50ad device id for new T5 card.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:09:04 +0000 (13:09 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's a small memblock accounting problem when freeing the initrd
and a Spectre-v2 mitigation for NVIDIA Denver CPUs which just requires
a match on the CPU ID register.
Summary:
- Mitigate Spectre-v2 for NVIDIA Denver CPUs
- Free memblocks corresponding to freed initrd area"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: capabilities: Add NVIDIA Denver CPU to bp_harden list
arm64: Add MIDR encoding for NVIDIA CPUs
arm64: To remove initrd reserved area entry from memblock
Tonghao Zhang [Fri, 11 May 2018 09:53:12 +0000 (02:53 -0700)]
net: doc: fix spelling mistake: "modrobe.d" -> "modprobe.d"
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Fri, 11 May 2018 09:53:11 +0000 (02:53 -0700)]
bonding: use the skb_get/set_queue_mapping
Use the skb_get_queue_mapping, skb_set_queue_mapping
and skb_rx_queue_recorded for skb queue_mapping in bonding
driver, but not use it directly.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Fri, 11 May 2018 09:53:10 +0000 (02:53 -0700)]
bonding: replace the return value type
The method ndo_start_xmit is defined as returning a
netdev_tx_t, which is a typedef for an enum type,
but the implementation in this driver returns an int.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:07:22 +0000 (13:07 -0700)]
Merge tag 'powerpc-4.17-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for an actual regression, the change to the SYSCALL_DEFINE
wrapper broke FTRACE_SYSCALLS for us due to a name mismatch. There's
also another commit to the same code to make sure we match all our
syscalls with various prefixes.
And then just one minor build fix, and the removal of an unused
variable that was removed and then snuck back in due to some rebasing.
Thanks to: Naveen N. Rao"
* tag 'powerpc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix CONFIG_NUMA=n build
powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix
powerpc/trace/syscalls: Update syscall name matching logic
powerpc/64: Remove unused paca->soft_enabled
Linus Torvalds [Fri, 11 May 2018 20:04:35 +0000 (13:04 -0700)]
Merge tag 'trace-v4.17-rc4' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Working on some new updates to trace filtering, I noticed that the
regex_match_front() test was updated to be limited to the size of the
pattern instead of the full test string.
But as the test string is not guaranteed to be nul terminated, it
still needs to consider the size of the test string"
* tag 'trace-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regex_match_front() to not over compare the test string
William Tu [Fri, 11 May 2018 12:49:47 +0000 (05:49 -0700)]
erspan: auto detect truncated ipv6 packets.
Currently the truncated bit is set only when 1) the mirrored packet
is larger than mtu and 2) the ipv4 packet tot_len is larger than
the actual skb->len. This patch adds another case for detecting
whether ipv6 packet is truncated or not, by checking the ipv6 header
payload_len and the skb->len.
Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 20:01:25 +0000 (16:01 -0400)]
Merge branch 'mlxsw-spectrum_span-Two-minor-adjustments'
Ido Schimmel says:
====================
mlxsw: spectrum_span: Two minor adjustments
Petr says:
This patch set fixes a couple of nits in mlxsw's SPAN implementation:
two counts of inaccurate variable name and one count of unsuitable error
code, fixed, respectively, in patches #1 and #2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 11 May 2018 08:57:31 +0000 (11:57 +0300)]
mlxsw: spectrum_span: Use a more fitting error code
ENOENT is suitable when an item is looked for in a collection and can't
be found. The failure here is actually a depletion of a resource, where
ENOBUFS is the more fitting error code.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 11 May 2018 08:57:30 +0000 (11:57 +0300)]
mlxsw: spectrum_span: Rename misnamed variable l3edev
Calling the variable l3edev was relevant when neighbor lookup was the
last stage in the simulated pipeline. Now that mlxsw handles bridges and
vlan devices as well, calling it "L3" is a misnomer.
Thus in mlxsw_sp_span_dmac(), rename to "dev", because that function is
just a service routine where the distinction between tunnel and egress
device isn't necessary.
In mlxsw_sp_span_entry_tunnel_parms_common(), rename to "edev" to
emphasize that the routine traces packet egress.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 19:57:23 +0000 (15:57 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-05-11
This series contains fixes to the ice, ixgbe and ixgbevf drivers.
Jeff Shaw provides a fix to ensure rq_last_status gets set, whether or
not the hardware responds with an error in the ice driver.
Emil adds a check for unsupported module during the reset routine for
ixgbe.
Luc Van Oostenryck fixes ixgbevf_xmit_frame() where it was not using the
correct return value (int).
Colin Ian King fixes a potential resource leak in ixgbe, where we were
not freeing ipsec in our cleanup path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 19:55:57 +0000 (15:55 -0400)]
Merge tag 'rxrpc-fixes-
20180510' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
Here are three fixes for AF_RXRPC and two tracepoints that were useful for
finding them:
(1) Fix missing start of expect-Rx-by timeout on initial packet
transmission so that calls will time out if the peer doesn't respond.
(2) Fix error reception on AF_INET6 sockets by using the correct family of
sockopts on the UDP transport socket.
(3) Fix setting the minimum security level on kernel calls so that they
can be encrypted.
(4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the
transport socket.
(5) Add a tracepoint to log UDP sendmsg failure so that we can find out if
transmission failure occurred on the UDP socket.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Fri, 11 May 2018 14:55:09 +0000 (10:55 -0400)]
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
When application fails to pass flags in netlink TLV for a new skbedit action,
the kernel results in the following oops:
[ 8.307732] BUG: unable to handle kernel paging request at
0000000000021130
[ 8.309167] PGD
80000000193d1067 P4D
80000000193d1067 PUD
180e0067 PMD 0
[ 8.310595] Oops: 0000 [#1] SMP PTI
[ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw
[ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357
[ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140
[ 8.316203] RSP: 0018:
ffffa0718038f840 EFLAGS:
00010246
[ 8.317123] RAX:
0000000000000001 RBX:
0000000000021100 RCX:
0000000000000000
[ 8.319831] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000021100
[ 8.321181] RBP:
0000000000000000 R08:
000000000004adf8 R09:
0000000000000122
[ 8.322645] R10:
0000000000000000 R11:
ffffffff9e5b01ed R12:
0000000000000000
[ 8.324157] R13:
ffffffff9e0d3cc0 R14:
0000000000000000 R15:
0000000000000000
[ 8.325590] FS:
00007f591292e700(0000) GS:
ffff8fcf5bc40000(0000) knlGS:
0000000000000000
[ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8.327987] CR2:
0000000000021130 CR3:
00000000180e6004 CR4:
00000000001606a0
[ 8.329289] Call Trace:
[ 8.329735] tcf_skbedit_init+0xa7/0xb0
[ 8.330423] tcf_action_init_1+0x362/0x410
[ 8.331139] ? try_to_wake_up+0x44/0x430
[ 8.331817] tcf_action_init+0x103/0x190
[ 8.332511] tc_ctl_action+0x11a/0x220
[ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0
[ 8.333902] ? _cond_resched+0x16/0x40
[ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0
[ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0
[ 8.336178] netlink_rcv_skb+0xdb/0x110
[ 8.336855] netlink_unicast+0x167/0x220
[ 8.337550] netlink_sendmsg+0x2a7/0x390
[ 8.338258] sock_sendmsg+0x30/0x40
[ 8.338865] ___sys_sendmsg+0x2c5/0x2e0
[ 8.339531] ? pagecache_get_page+0x27/0x210
[ 8.340271] ? filemap_fault+0xa2/0x630
[ 8.340943] ? page_add_file_rmap+0x108/0x200
[ 8.341732] ? alloc_set_pte+0x2aa/0x530
[ 8.342573] ? finish_fault+0x4e/0x70
[ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0
[ 8.344337] ? __sys_sendmsg+0x53/0x80
[ 8.345040] __sys_sendmsg+0x53/0x80
[ 8.345678] do_syscall_64+0x4f/0x100
[ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.347206] RIP: 0033:0x7f591191da67
[ 8.347831] RSP: 002b:
00007fff745abd48 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
[ 8.349179] RAX:
ffffffffffffffda RBX:
00007fff745abe70 RCX:
00007f591191da67
[ 8.350431] RDX:
0000000000000000 RSI:
00007fff745abdc0 RDI:
0000000000000003
[ 8.351659] RBP:
000000005af35251 R08:
0000000000000001 R09:
0000000000000000
[ 8.352922] R10:
00000000000005f1 R11:
0000000000000246 R12:
0000000000000000
[ 8.354183] R13:
00007fff745afed0 R14:
0000000000000001 R15:
00000000006767c0
[ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00
00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30
74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c
[ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP:
ffffa0718038f840
[ 8.359770] CR2:
0000000000021130
[ 8.360438] ---[ end trace
60c66be45dfc14f0 ]---
The caller calls action's ->init() and passes pointer to "struct tc_action *a",
which later may be initialized to point at the existing action, otherwise
"struct tc_action *a" is still invalid, and therefore dereferencing it is an
error as happens in tcf_idr_release, where refcnt is decremented.
So in case of missing flags tcf_idr_release must be called only for
existing actions.
v2:
- prepare patch for net tree
Fixes: 5e1567aeb7fe ("net sched: skbedit action fix late binding")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 19:30:34 +0000 (12:30 -0700)]
Merge tag 'for-linus-4.17-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"One fix for the kernel running as a fully virtualized guest using PV
drivers on old Xen hypervisor versions"
* tag 'for-linus-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Reset VCPU0 info pointer after shared_info remap
Colin Ian King [Wed, 9 May 2018 13:58:48 +0000 (14:58 +0100)]
ixgbe: fix memory leak on ipsec allocation
The error clean up path kfree's adapter->ipsec and should be
instead kfree'ing ipsec. Fix this. Also, the err1 error exit path
does not need to kfree ipsec because this failure path was for
the failed allocation of ipsec.
Detected by CoverityScan, CID#146424 ("Resource Leak")
Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Luc Van Oostenryck [Tue, 24 Apr 2018 13:16:48 +0000 (15:16 +0200)]
ixgbevf: fix ixgbevf_xmit_frame()'s return type
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 20 Apr 2018 00:06:57 +0000 (17:06 -0700)]
ixgbe: return error on unsupported SFP module when resetting
Add check for unsupported module and return the error code.
This fixes a Coverity hit due to unused return status from setup_sfp.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Shaw [Wed, 18 Apr 2018 18:23:27 +0000 (11:23 -0700)]
ice: Set rq_last_status when cleaning rq
Prior to this commit, the rq_last_status was only set when hardware
responded with an error. This leads to rq_last_status being invalid
in the future when hardware eventually responds without error. This
commit resolves the issue by unconditionally setting rq_last_status
with the value returned in the descriptor.
Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous
interrupt")
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trond Myklebust [Fri, 11 May 2018 18:13:57 +0000 (14:13 -0400)]
Change Trond's email address in MAINTAINERS
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Rob Herring [Fri, 11 May 2018 13:45:59 +0000 (08:45 -0500)]
sh: switch to NO_BOOTMEM
Commit
0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc")
inadvertently switched the DT unflattening allocations from memblock to
bootmem which doesn't work because the unflattening happens before
bootmem is initialized. Swapping the order of bootmem init and
unflattening could also fix this, but removing bootmem is desired. So
enable NO_BOOTMEM on SH like other architectures have done.
Fixes: 0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc")
Reported-by: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rich Felker <dalias@libc.org>
Linus Torvalds [Fri, 11 May 2018 16:52:01 +0000 (09:52 -0700)]
mmap: introduce sane default mmap limits
The internal VM "mmap()" interfaces are based on the mmap target doing
everything using page indexes rather than byte offsets, because
traditionally (ie 32-bit) we had the situation that the byte offset
didn't fit in a register. So while the mmap virtual address was limited
by the word size of the architecture, the backing store was not.
So we're basically passing "pgoff" around as a page index, in order to
be able to describe backing store locations that are much bigger than
the word size (think files larger than 4GB etc).
But while this all makes a ton of sense conceptually, we've been dogged
by various drivers that don't really understand this, and internally
work with byte offsets, and then try to work with the page index by
turning it into a byte offset with "pgoff << PAGE_SHIFT".
Which obviously can overflow.
Adding the size of the mapping to it to get the byte offset of the end
of the backing store just exacerbates the problem, and if you then use
this overflow-prone value to check various limits of your device driver
mmap capability, you're just setting yourself up for problems.
The correct thing for drivers to do is to do their limit math in page
indices, the way the interface is designed. Because the generic mmap
code _does_ test that the index doesn't overflow, since that's what the
mmap code really cares about.
HOWEVER.
Finding and fixing various random drivers is a sisyphean task, so let's
just see if we can just make the core mmap() code do the limiting for
us. Realistically, the only "big" backing stores we need to care about
are regular files and block devices, both of which are known to do this
properly, and which have nice well-defined limits for how much data they
can access.
So let's special-case just those two known cases, and then limit other
random mmap users to a backing store that still fits in "unsigned long".
Realistically, that's not much of a limit at all on 64-bit, and on
32-bit architectures the only worry might be the GPU drivers, which can
have big physical address spaces.
To make it possible for drivers like that to say that they are 64-bit
clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the
file flags to allow drivers to mark their file descriptors as safe in
the full 64-bit mmap address space.
[ The timing for doing this is less than optimal, and this should really
go in a merge window. But realistically, this needs wide testing more
than it needs anything else, and being main-line is the only way to do
that.
So the earlier the better, even if it's outside the proper development
cycle - Linus ]
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 May 2018 16:49:02 +0000 (09:49 -0700)]
Merge tag 'pm-4.17-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two PCI power management regressions from the 4.13 cycle and
one cpufreq schedutil governor bug introduced during the 4.12 cycle,
drop a stale comment from the schedutil code and fix two mistakes in
docs.
Specifics:
- Restore device_may_wakeup() check in pci_enable_wake() removed
inadvertently during the 4.13 cycle to prevent systems from drawing
excessive power when suspended or off, among other things (Rafael
Wysocki).
- Fix pci_dev_run_wake() to properly handle devices that only can
signal PME# when in the D3cold power state (Kai Heng Feng).
- Fix the schedutil cpufreq governor to avoid using UINT_MAX as the
new CPU frequency in some cases due to a missing check (Rafael
Wysocki).
- Remove a stale comment regarding worker kthreads from the schedutil
cpufreq governor (Juri Lelli).
- Fix a copy-paste mistake in the intel_pstate driver documentation
(Juri Lelli).
- Fix a typo in the system sleep states documentation (Jonathan
Neuschäfer)"
* tag 'pm-4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / PM: Check device_may_wakeup() in pci_enable_wake()
PCI / PM: Always check PME wakeup capability for runtime wakeup support
cpufreq: schedutil: Avoid using invalid next_freq
cpufreq: schedutil: remove stale comment
PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph
PM: docs: sleep-states: Fix a typo ("includig")
Linus Torvalds [Fri, 11 May 2018 16:46:14 +0000 (09:46 -0700)]
Merge tag 'mtd/fixes-for-4.17-rc5' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
- make nand_soft_waitrdy() wait tWB before polling the status REG
- fix BCH write in the the Marvell NAND controller driver
- fix wrong picosec to msec conversion in the Marvell NAND controller
driver
- fix DMA handling in the TI OneNAND controllre driver
* tag 'mtd/fixes-for-4.17-rc5' of git://git.infradead.org/linux-mtd:
mtd: rawnand: Make sure we wait tWB before polling the STATUS reg
mtd: rawnand: marvell: fix command xtype in BCH write hook
mtd: rawnand: marvell: pass ms delay to wait_op
mtd: onenand: omap2: Disable DMA for HIGHMEM buffers
Eric Dumazet [Fri, 11 May 2018 02:07:13 +0000 (19:07 -0700)]
udp: avoid refcount_t saturation in __udp_gso_segment()
For some reason, Willem thought that the issue we fixed for TCP
in commit
7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
tcp_gso_segment()") was not relevant for UDP GSO.
But syzbot found its way.
refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
panic+0x22f/0x4de kernel/panic.c:184
__warn.cold.8+0x163/0x1b3 kernel/panic.c:536
report_bug+0x252/0x2d0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
RSP: 0018:
ffff880196db6b90 EFLAGS:
00010282
RAX:
0000000000000026 RBX:
00000000ffffff01 RCX:
ffffc900040d9000
RDX:
0000000000004a29 RSI:
ffffffff8160f6f1 RDI:
ffff880196db66f0
RBP:
ffff880196db6c78 R08:
ffff8801b33d6740 R09:
0000000000000002
R10:
ffff8801b33d6740 R11:
0000000000000000 R12:
0000000000000000
R13:
00000000ffffffff R14:
ffff880196db6c50 R15:
0000000000020101
refcount_add+0x1b/0x70 lib/refcount.c:102
__udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272
udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301
inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342
skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
__skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
skb_gso_segment include/linux/netdevice.h:4050 [inline]
validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
__dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579
dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401
neigh_output include/net/neighbour.h:483 [inline]
ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229
ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:277 [inline]
ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
dst_output include/net/dst.h:444 [inline]
ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434
udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825
udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853
udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105
udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403
udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447
sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
__sys_setsockopt+0x1bd/0x390 net/socket.c:1903
__do_sys_setsockopt net/socket.c:1914 [inline]
__se_sys_setsockopt net/socket.c:1911 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ad405857b174 ("udp: better wmem accounting on gso")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 16:26:29 +0000 (12:26 -0400)]
Merge tag 'mlx5-fixes-2018-05-10' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-05-10
the following series includes some fixes for mlx5 core driver.
Please pull and let me know if there's any problem.
For -stable v4.5
("net/mlx5: E-Switch, Include VF RDMA stats in vport statistics")
For -stable v4.10
("net/mlx5e: Err if asked to offload TC match on frag being first")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 May 2018 21:59:43 +0000 (14:59 -0700)]
tcp: switch pacing timer to softirq based hrtimer
linux-4.16 got support for softirq based hrtimers.
TCP can switch its pacing hrtimer to this variant, since this
avoids going through a tasklet and some atomic operations.
pacing timer logic looks like other (jiffies based) tcp timers.
v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers()
to correctly release reference on socket if needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 16:18:02 +0000 (09:18 -0700)]
Merge tag 'drm-fixes-for-v4.17-rc5' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"nouveau, amdgpu, i915, vc4, omap, exynos and atomic fixes.
As last week seemed a bit slow, we got a few more fixes this week.
The main stuff is two weeks of fixes for amdgpu, some missing bits of
vega12 atom firmware support were added, and some power management
fixes.
Nouveau got two regression fixes for an DP MST deadlock and a random
oops fix.
i915 got an LVDS panel timeout fix 2 WARN fixes.
exynos fixed a pagefault issue in the mixer driver.
vc4 has an oops fix.
omap had a bunch of uninit var and error-checking fixes. Two atomic
modesetting state fixes.
One minor agp cleanup patch"
* tag 'drm-fixes-for-v4.17-rc5' of git://people.freedesktop.org/~airlied/linux: (30 commits)
drm/amd/pp: Fix performance drop on Fiji
drm/nouveau: Fix deadlock in nv50_mstm_register_connector()
drm/nouveau/ttm: don't dereference nvbo::cli, it can outlive client
agp: uninorth: make two functions static
drm/amd/pp: Refine the output of pp_power_profile_mode on VI
drm/amdgpu: Switch to interruptable wait to recover from ring hang.
drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages
drm/amd/display: Use kvzalloc for potentially large allocations
drm/amd/display: Don't return ddc result and read_bytes in same return value
drm/amd/display: Add get_firmware_info_v3_2 for VG12
drm/amd: Add BIOS smu_info v3_3 required struct def.
drm/amd/display: Add VG12 ASIC IDs
drm/vc4: Fix scaling of uni-planar formats
drm/exynos: hdmi: avoid duplicating drm_bridge_attach
drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log
drm/i915: Correctly populate user mode h/vdisplay with pipe src size during readout
drm/i915: Adjust eDP's logical vco in a reliable place.
drm/bridge/sii8620: add Kconfig dependency on extcon
drm/omap: handle alloc failures in omap_connector
drm/omap: add missing linefeeds to prints
...
David S. Miller [Fri, 11 May 2018 16:03:06 +0000 (12:03 -0400)]
Merge branch 'dsa-Plug-in-PHYLINK-support'
Florian Fainelli says:
====================
net: dsa: Plug in PHYLINK support
This patch series adds PHYLINK support to DSA which is necessary to support more
complex PHY and pluggable modules setups.
Patch series can be found here:
https://github.com/ffainelli/linux/commits/dsa-phylink-v2
This was tested on:
- dsa-loop
- bcm_sf2
- mv88e6xxx
- b53
With a variety of test cases:
- internal & external MDIO PHYs
- MoCA with link notification through interrupt/MMIO register
- built-in PHYs
- ifconfig up/down for several cycles works
- bind/unbind of the drivers
Changes in v2:
- fixed link configuration for mv88e6xxx (Andrew) after introducing polling
This is technically v2 of what was posted back in March 2018, changes from last
time:
- fixed probe/remove of drivers
- fixed missing gpiod_put() for link GPIOs
- fixed polling of link GPIOs (Russell I would need your SoB on the patch you
provided offline initially, added some modifications to it)
- tested across a wider set of platforms
And everything should still work as expected. Please be aware of the following:
- switch drivers (like bcm_sf2) which may have user-facing network ports using
fixed links would need to implement phylink_mac_ops to remain functional.
PHYLINK does not create a phy_device for fixed links, therefore our
call to adjust_link() from phylink_mac_link_{up,down} would not be calling
into the driver. This *should not* affect CPU/DSA ports which are configured
through adjust_link() but have no network devices
- support for SFP/SFF is now possible, but switch drivers will still need some
modifications to properly support those, including, but not limited to using
the correct binding information. This will be submitted on top of this series
Please do test on your respective platforms/switches and let me know if you
find any issues, hopefully everything still works like before.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:37 +0000 (13:17 -0700)]
net: dsa: bcm_sf2: Get rid of PHYLIB functions
Now that we have converted the bcm_sf2 driver to implement PHYLINK MAC
operations, we can remove the PHYLIB callbacks: adjust_link() and
fixed_link_update() which are no longer called by DSA.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:36 +0000 (13:17 -0700)]
net: dsa: Plug in PHYLINK support
Add support for PHYLINK within the DSA subsystem in order to support more
complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G
PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of
complexity we had while probing fixed and non-fixed PHYs using Device Tree.
Because PHYLINK separates the Ethernet MAC/port configuration into different
stages, we let switch drivers implement those, and for now, we maintain
functionality by calling dsa_slave_adjust_link() during
phylink_mac_link_{up,down} which provides semantically equivalent steps.
Drivers willing to take advantage of PHYLINK should implement the phylink_mac_*
operations that DSA wraps.
We cannot quite remove the adjust_link() callback just yet, because a number of
drivers rely on that for configuring their "CPU" and "DSA" ports, this is done
dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still.
Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need
to implement phylink_mac_ops from now on to preserve functionality, since PHYLINK
*does not* create a phy_device instance for fixed links.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 10 May 2018 20:17:35 +0000 (13:17 -0700)]
net: dsa: mv88e6xxx: add PHYLINK support
Add rudimentary phylink support to mv88e6xxx. This allows the driver
using user ports with fixed links to keep operating normally. User ports
with normal PHYs are not affected since the switch automatically manages
their link parameters. User facing ports which use a SFP/SFF with a
non-fixed link mode might require a call to phylink_mac_change() to
operate properly.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[Andrew: fixed link setting after adding link polling]
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[florian: expand commit message]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:34 +0000 (13:17 -0700)]
net: dsa: Eliminate dsa_slave_get_link()
Since we use PHYLIB to manage the per-port link indication, this will
also be reflected correctly in the network device's carrier state, so we
can use ethtool_op_get_link() instead.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:33 +0000 (13:17 -0700)]
net: dsa: bcm_sf2: Implement phylink_mac_ops
Make the bcm_sf2 driver implement phylink_mac_ops since it needs to
support a wide variety of network interfaces: internal & external MDIO
PHYs, fixed PHYs, MoCA with MMIO link status.
A large amount of what needs to be done already exists under
bcm_sf2_sw_adjust_link() so we are essentially breaking this down into
the necessary operation for PHYLINK to work: mac_config, mac_link_up,
mac_link_down and validate. We can now entirely get rid of most of what
fixed_link_update() provided because only the link information is actually
necessary. We still have to force DUPLEX_FULL for legacy Device Tree bindings
that did not specify that before.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:32 +0000 (13:17 -0700)]
net: dsa: Add PHYLINK switch operations
In preparation for adding support for PHYLINK within DSA, define a number of
operations that we will need and that switch drivers can start implementing.
Proper integration with PHYLINK will follow in subsequent patches.
We start selecting PHYLINK (which implies PHYLIB) in net/dsa/Kconfig
such that drivers can be guaranteed that this dependency is properly
taken care of and can start referencing PHYLINK helper functions without
requiring stubs or anything.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 10 May 2018 20:17:31 +0000 (13:17 -0700)]
net: phy: phylink: Poll link GPIOs
When using a fixed link with a link GPIO, we need to poll that GPIO to
determine link state changes. This is consistent with what fixed_phy.c does.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:30 +0000 (13:17 -0700)]
net: phy: phylink: Release link GPIO
We are not releasing the link GPIO descriptor with gpiod_put() which results in
subsequent probing to get -EBUSY when calling fwnode_get_named_gpiod(). Fix this
by doing the release in phylink_destroy().
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 10 May 2018 20:17:29 +0000 (13:17 -0700)]
net: phy: phylink: Use gpiod_get_value_cansleep()
The GPIO provider for the link GPIO line might require the use of the
_cansleep() API, utilize that. This is safe to do since we run in workqueue
context.
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Ignatov [Thu, 10 May 2018 17:59:34 +0000 (10:59 -0700)]
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in
919483096bfe.
* udp_sendmsg one was there since the beginning when linux sources were
first added to git;
* ping_v4_sendmsg one was copy/pasted in
c319b4d76b9e.
Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.
Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.
Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Thu, 10 May 2018 11:26:16 +0000 (13:26 +0200)]
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
Resources are not freed in the reverse order of the allocation.
Labels are also mixed-up.
Fix it and reorder code and labels in the error handling path of
'mlxsw_core_bus_device_register()'
Fixes: ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 15:50:41 +0000 (11:50 -0400)]
Merge branch 'bonding-bug-fixes-and-regressions'
Debabrata Banerjee says:
====================
bonding: bug fixes and regressions
Fixes to bonding driver for balance-alb mode, suitable for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Debabrata Banerjee [Wed, 9 May 2018 23:32:11 +0000 (19:32 -0400)]
bonding: send learning packets for vlans on slave
There was a regression at some point from the intended functionality of
commit
f60c3704e87d ("bonding: Fix alb mode to only use first level
vlans.")
Given the return value vlan_get_encap_level() we need to store the nest
level of the bond device, and then compare the vlan's encap level to
this. Without this, this check always fails and learning packets are
never sent.
In addition, this same commit caused a regression in the behavior of
balance_alb, which requires learning packets be sent for all interfaces
using the slave's mac in order to load balance properly. For vlan's
that have not set a user mac, we can send after checking one bit.
Otherwise we need send the set mac, albeit defeating rx load balancing
for that vlan.
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>