Andy Roulin [Thu, 26 Dec 2019 13:41:57 +0000 (05:41 -0800)]
bonding: rename AD_STATE_* to LACP_STATE_*
As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.
Fixes: 826f66b30c2e3 ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Kou [Thu, 26 Dec 2019 12:29:17 +0000 (12:29 +0000)]
sctp: move trace_sctp_probe_path into sctp_outq_sack
The original patch bringed in the "SCTP ACK tracking trace event"
feature was committed at Dec.20, 2017, it replaced jprobe usage
with trace events, and bringed in two trace events, one is
TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
The original patch intended to trigger the trace_sctp_probe_path in
TRACE_EVENT(sctp_probe) as below code,
+TRACE_EVENT(sctp_probe,
+
+ TP_PROTO(const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ struct sctp_chunk *chunk),
+
+ TP_ARGS(ep, asoc, chunk),
+
+ TP_STRUCT__entry(
+ __field(__u64, asoc)
+ __field(__u32, mark)
+ __field(__u16, bind_port)
+ __field(__u16, peer_port)
+ __field(__u32, pathmtu)
+ __field(__u32, rwnd)
+ __field(__u16, unack_data)
+ ),
+
+ TP_fast_assign(
+ struct sk_buff *skb = chunk->skb;
+
+ __entry->asoc = (unsigned long)asoc;
+ __entry->mark = skb->mark;
+ __entry->bind_port = ep->base.bind_addr.port;
+ __entry->peer_port = asoc->peer.port;
+ __entry->pathmtu = asoc->pathmtu;
+ __entry->rwnd = asoc->peer.rwnd;
+ __entry->unack_data = asoc->unack_data;
+
+ if (trace_sctp_probe_path_enabled()) {
+ struct sctp_transport *sp;
+
+ list_for_each_entry(sp, &asoc->peer.transport_addr_list,
+ transports) {
+ trace_sctp_probe_path(sp, asoc);
+ }
+ }
+ ),
But I found it did not work when I did testing, and trace_sctp_probe_path
had no output, I finally found that there is trace buffer lock
operation(trace_event_buffer_reserve) in include/trace/trace_events.h:
static notrace void \
trace_event_raw_event_##call(void *__data, proto) \
{ \
struct trace_event_file *trace_file = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
struct trace_event_buffer fbuffer; \
struct trace_event_raw_##call *entry; \
int __data_size; \
\
if (trace_trigger_soft_disabled(trace_file)) \
return; \
\
__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
\
entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
sizeof(*entry) + __data_size); \
\
if (!entry) \
return; \
\
tstruct \
\
{ assign; } \
\
trace_event_buffer_commit(&fbuffer); \
}
The reason caused no output of trace_sctp_probe_path is that
trace_sctp_probe_path written in TP_fast_assign part of
TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
trace_event_buffer_reserve() when compiler expands Macro,
entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
sizeof(*entry) + __data_size); \
\
if (!entry) \
return; \
\
tstruct \
\
{ assign; } \
so trace_sctp_probe_path finally can not acquire trace_event_buffer
and return no output, that is to say the nest of tracepoint entry function
is not allowed. The function call flow is:
trace_sctp_probe()
-> trace_event_raw_event_sctp_probe()
-> lock buffer
-> trace_sctp_probe_path()
-> trace_event_raw_event_sctp_probe_path() --nested
-> buffer has been locked and return no output.
This patch is to remove trace_sctp_probe_path from the TP_fast_assign
part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
and trigger sctp_probe_path_trace in sctp_outq_sack.
After this patch, you can enable both events individually,
# cd /sys/kernel/debug/tracing
# echo 1 > events/sctp/sctp_probe/enable
# echo 1 > events/sctp/sctp_probe_path/enable
Or, you can enable all the events under sctp.
# echo 1 > events/sctp/enable
Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Dec 2019 03:51:34 +0000 (19:51 -0800)]
Merge branch 'Peer-to-Peer-One-Step-time-stamping'
Richard Cochran says:
====================
Peer to Peer One-Step time stamping
This series adds support for PTP (IEEE 1588) P2P one-step time
stamping along with a driver for a hardware device that supports this.
If the hardware supports p2p one-step, it subtracts the ingress time
stamp value from the Pdelay_Request correction field. The user space
software stack then simply copies the correction field into the
Pdelay_Response, and on transmission the hardware adds the egress time
stamp into the correction field.
This new functionality extends CONFIG_NETWORK_PHY_TIMESTAMPING to
cover MII snooping devices, but it still depends on phylib, just as
that option does. Expanding beyond phylib is not within the scope of
the this series.
User space support is available in the current linuxptp master branch.
- Patch 1 adds phy_device methods for existing time stamping fields.
- Patches 2-5 convert the stack and drivers to the new methods.
- Patch 6 moves code around the dp83640 driver.
- Patches 7-10 add support for MII time stamping in non-PHY devices.
- Patch 11 adds the new P2P 1-step option.
- Patch 12 adds a driver implementing the new option.
Thanks,
Richard
Changed in v9:
~~~~~~~~~~~~~~
- Fix two more drivers' switch/case blocks WRT the new HWTSTAMP ioctl.
- Picked up two more review tags from Andrew.
Changed in v8:
~~~~~~~~~~~~~~
- Avoided adding forward functional declarations in the dp83640 driver.
- Picked up Florian's new review tags and another one from Andrew.
Changed in v7:
~~~~~~~~~~~~~~
- Converted pr_debug|err to dev_ variants in new driver.
- Fixed device tree documentation per Rob's v6 review.
- Picked up Andrew's and Rob's review tags.
- Silenced sparse warnings in new driver.
Changed in v6:
~~~~~~~~~~~~~~
- Added methods for accessing the phy_device time stamping fields.
- Adjust the device tree documentation per Rob's v5 review.
- Fixed the build failures due to missing exports.
Changed in v5:
~~~~~~~~~~~~~~
- Fixed build failure in macvlan.
- Fixed latent bug with its gcc warning in the driver.
Changed in v4:
~~~~~~~~~~~~~~
- Correct error paths and PTR_ERR return values in the framework.
- Expanded KernelDoc comments WRT PHY locking.
- Pick up Andrew's review tag.
Changed in v3:
~~~~~~~~~~~~~~
- Simplify the device tree binding and document the time stamping
phandle by itself.
Changed in v2:
~~~~~~~~~~~~~~
- Per the v1 review, changed the modeling of MII time stamping
devices. They are no longer a kind of mdio device.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:20 +0000 (18:16 -0800)]
ptp: Add a driver for InES time stamping IP core.
The InES at the ZHAW offers a PTP time stamping IP core. The FPGA
logic recognizes and time stamps PTP frames on the MII bus. This
patch adds a driver for the core along with a device tree binding to
allow hooking the driver to MII buses.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:19 +0000 (18:16 -0800)]
net: Introduce peer to peer one step PTP time stamping.
The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages. Up until now, hardware with P2P one step has
been rare, and kernel support was lacking. This patch adds support of
the mode in anticipation of new hardware developments.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:18 +0000 (18:16 -0800)]
net: mdio: of: Register discovered MII time stampers.
When parsing a PHY node, register its time stamper, if any, and attach
the instance to the PHY device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:17 +0000 (18:16 -0800)]
dt-bindings: ptp: Introduce MII time stamping devices.
This patch add a new binding that allows non-PHY MII time stamping
devices to find their buses. The new documentation covers both the
generic binding and one upcoming user.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:16 +0000 (18:16 -0800)]
net: Add a layer for non-PHY MII time stamping drivers.
While PHY time stamping drivers can simply attach their interface
directly to the PHY instance, stand alone drivers require support in
order to manage their services. Non-PHY MII time stamping drivers
have a control interface over another bus like I2C, SPI, UART, or via
a memory mapped peripheral. The controller device will be associated
with one or more time stamping channels, each of which sits snoops in
on a MII bus.
This patch provides a glue layer that will enable time stamping
channels to find their controlling device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:15 +0000 (18:16 -0800)]
net: Introduce a new MII time stamping interface.
Currently the stack supports time stamping in PHY devices. However,
there are newer, non-PHY devices that can snoop an MII bus and provide
time stamps. In order to support such devices, this patch introduces
a new interface to be used by both PHY and non-PHY devices.
In addition, the one and only user of the old PHY time stamping API is
converted to the new interface.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:14 +0000 (18:16 -0800)]
net: phy: dp83640: Move the probe and remove methods around.
An upcoming patch will change how the PHY time stamping functions are
registered with the networking stack, and adapting this driver would
entail adding forward declarations for four time stamping methods.
However, forward declarations are considered to be stylistic defects.
This patch avoids the issue by moving the probe and remove methods
immediately above the phy_driver interface structure.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:13 +0000 (18:16 -0800)]
net: netcp_ethss: Use the PHY time stamping interface.
The netcp_ethss driver tests fields of the phy_device in order to
determine whether to defer to the PHY's time stamping functionality.
This patch replaces the open coded logic with an invocation of the
proper methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:12 +0000 (18:16 -0800)]
net: ethtool: Use the PHY time stamping interface.
The ethtool layer tests fields of the phy_device in order to determine
whether to invoke the PHY's tsinfo ethtool callback. This patch
replaces the open coded logic with an invocation of the proper
methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:11 +0000 (18:16 -0800)]
net: vlan: Use the PHY time stamping interface.
The vlan layer tests fields of the phy_device in order to determine
whether to invoke the PHY's tsinfo ethtool callback. This patch
replaces the open coded logic with an invocation of the proper
methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:10 +0000 (18:16 -0800)]
net: macvlan: Use the PHY time stamping interface.
The macvlan layer tests fields of the phy_device in order to determine
whether to invoke the PHY's tsinfo ethtool callback. This patch
replaces the open coded logic with an invocation of the proper
methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 26 Dec 2019 02:16:09 +0000 (18:16 -0800)]
net: phy: Introduce helper functions for time stamping support.
Some parts of the networking stack and at least one driver test fields
within the 'struct phy_device' in order to query time stamping
capabilities and to invoke time stamping methods. This patch adds a
functional interface around the time stamping fields. This will allow
insulating the callers from future changes to the details of the time
stamping implemenation.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Dec 2019 06:37:30 +0000 (22:37 -0800)]
Merge branch 'Simplify-IPv6-route-offload-API'
Ido Schimmel says:
====================
Simplify IPv6 route offload API
Motivation
==========
This is the IPv6 counterpart of "Simplify IPv4 route offload API" [1].
The aim of this patch set is to simplify the IPv6 route offload API by
making the stack a bit smarter about the notifications it is generating.
This allows driver authors to focus on programming the underlying device
instead of having to duplicate the IPv6 route insertion logic in their
driver, which is error-prone.
Details
=======
Today, whenever an IPv6 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / nexthop info.
This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv6 route
insertion logic and store multiple identical routes.
The route that is notified is the first route in the IPv6 FIB node,
which represents a single prefix and length in a given table. In case
the route is deleted and there is another route with the same key, a
replace notification is emitted. Otherwise, a delete notification is
emitted.
Unlike IPv4, in IPv6 it is possible to append individual nexthops to an
existing multipath route. Therefore, in addition to the replace and
delete notifications present in IPv4, an append notification is also
used.
Testing
=======
To ensure there is no degradation in route insertion rates, I averaged
the insertion rate of 512k routes (/64 and /128) over 50 runs. Did not
observe any degradation.
Functional tests are available here [2]. They rely on route trap
indication, which is added in a subsequent patch set.
In addition, I have been running syzkaller for the past couple of weeks
with debug options enabled. Did not observe any problems.
Patch set overview
==================
Patches #1-#7 gradually introduce the new FIB notifications
Patch #8 converts mlxsw to use the new notifications
Patch #9 remove the old notifications
[1] https://patchwork.ozlabs.org/cover/
1209738/
[2] https://github.com/idosch/linux/tree/fib-notifier
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:20 +0000 (15:28 +0200)]
ipv6: Remove old route notifications and convert listeners
Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:19 +0000 (15:28 +0200)]
mlxsw: spectrum_router: Start using new IPv6 route notifications
With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv6 code.
Instead, mlxsw only needs to take care of inserting and removing routes
from the device.
Convert mlxsw to use the new IPv6 route notifications and simplify the
code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:18 +0000 (15:28 +0200)]
ipv6: Handle multipath route deletion notification
When an entire multipath route is deleted, only emit a notification if
it is the first route in the node. Emit a replace notification in case
the last sibling is followed by another route. Otherwise, emit a delete
notification.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:17 +0000 (15:28 +0200)]
ipv6: Handle route deletion notification
For the purpose of route offload, when a single route is deleted, it is
only of interest if it is the first route in the node or if it is
sibling to such a route.
In the first case, distinguish between several possibilities:
1. Route is the last route in the node. Emit a delete notification
2. Route is followed by a non-multipath route. Emit a replace
notification for the non-multipath route.
3. Route is followed by a multipath route. Emit a replace notification
for the multipath route.
In the second case, only emit a delete notification to ensure the route
is no longer used as a valid nexthop.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:16 +0000 (15:28 +0200)]
ipv6: Only Replay routes of interest to new listeners
When a new listener is registered to the FIB notification chain it
receives a dump of all the available routes in the system. Instead, make
sure to only replay the IPv6 routes that are actually used in the data
path and are of any interest to the new listener.
This is done by iterating over all the routing tables in the given
namespace, but from each traversed node only the first route ('leaf') is
notified. Multipath routes are notified in a single notification instead
of one for each nexthop.
Add fib6_rt_dump_tmp() to do that. Later on in the patch set it will be
renamed to fib6_rt_dump() instead of the existing one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:15 +0000 (15:28 +0200)]
ipv6: Notify multipath route if should be offloaded
In a similar fashion to previous patches, only notify the new multipath
route if it is the first route in the node or if it was appended to such
route.
The type of the notification (replace vs. append) is determined based on
the number of routes added ('nhn') and the number of sibling routes. If
the two do not match, then an append notification should be sent.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:14 +0000 (15:28 +0200)]
ipv6: Notify route if replacing currently offloaded one
Similar to the corresponding IPv4 patch, only notify the new route if it
is replacing the currently offloaded one. Meaning, the one pointed to by
'fn->leaf'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:13 +0000 (15:28 +0200)]
ipv6: Notify newly added route if should be offloaded
fib6_add_rt2node() takes care of adding a single route ('struct
fib6_info') to a FIB node. The route in question should only be notified
in case it is added as the first route in the node (lowest metric) or if
it is added as a sibling route to the first route in the node.
The first criterion can be tested by checking if the route is pointed to
by 'fn->leaf'. The second criterion can be tested by checking the new
'notify_sibling_rt' variable that is set when the route is added as a
sibling to the first route in the node.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 23 Dec 2019 13:28:12 +0000 (15:28 +0200)]
net: fib_notifier: Add temporary events to the FIB notification chain
Subsequent patches are going to simplify the IPv6 route offload API,
which will only use three events - replace, delete and append.
Introduce a temporary version of replace and delete in order to make the
conversion easier to review. Note that append does not need a temporary
version, as it is currently not used.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 21 Dec 2019 13:15:21 +0000 (14:15 +0100)]
r8169: move enabling EEE to rtl8169_init_phy
Simplify the code by moving the call to rtl_enable_eee() from the
individual PHY configs to rtl8169_init_phy().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 21 Dec 2019 13:11:08 +0000 (14:11 +0100)]
r8169: remove MAC workaround in rtl8168e_2_hw_phy_config
Due to recent changes we don't need the call to rtl_rar_exgmac_set()
and longer at this place. It's called from rtl_rar_set() which is
called in rtl_init_mac_address() and rtl8169_resume().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 21 Dec 2019 13:03:38 +0000 (14:03 +0100)]
r8169: factor out rtl8168h_2_get_adc_bias_ioffset
Simplify and factor out this magic from rtl8168h_2_hw_phy_config()
and name it based on the vendor driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Dec 2019 06:24:45 +0000 (22:24 -0800)]
Merge branch 'ovs-mpls-actions'
Martin Varghese says:
====================
New openvswitch MPLS actions for layer 2 tunnelling
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.
The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.
POP_MPLS action is extended to support ethertype 0x6558
OVS userspace changes -
---------------------
Encap & Decap ovs actions are extended to support MPLS packet type. The encap & decap
adds and removes MPLS header at the start of packet as depicted below.
Actions - encap(mpls(ether_type=0x8847)),encap(ethernet)
Incoming packet -> | ETH | IP | Payload |
1 Actions - encap(mpls(ether_type=0x8847)) [Kernel action - add_mpls:0x8847]
Outgoing packet -> | MPLS | ETH | Payload|
2 Actions - encap(ethernet) [ Kernel action - push_eth ]
Outgoing packet -> | ETH | MPLS | ETH | Payload|
Decapsulation:
Incoming packet -> | ETH | MPLS | ETH | IP | Payload |
Actions - decap(),decap(packet_type(ns=0,type=0)
1 Actions - decap() [Kernel action - pop_eth)
Outgoing packet -> | MPLS | ETH | IP | Payload|
2 Actions - decap(packet_type(ns=0,type=0) [Kernel action - pop_mpls:0x6558]
Outgoing packet -> | ETH | IP | Payload
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Varghese [Sat, 21 Dec 2019 03:20:46 +0000 (08:50 +0530)]
openvswitch: New MPLS actions for layer 2 tunnelling
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.
The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.
POP_MPLS action is extended to support ethertype 0x6558.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Varghese [Sat, 21 Dec 2019 03:20:23 +0000 (08:50 +0530)]
net: Rephrased comments section of skb_mpls_pop()
Rephrased comments section of skb_mpls_pop() to align it with
comments section of skb_mpls_push().
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Varghese [Sat, 21 Dec 2019 03:20:01 +0000 (08:50 +0530)]
net: skb_mpls_push() modified to allow MPLS header push at start of packet.
The existing skb_mpls_push() implementation always inserts mpls header
after the mac header. L2 VPN use cases requires MPLS header to be
inserted before the ethernet header as the ethernet packet gets tunnelled
inside MPLS header in those cases.
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 22 Dec 2019 23:15:05 +0000 (15:15 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Mere overlapping changes in the conflicts here.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 22 Dec 2019 18:59:06 +0000 (10:59 -0800)]
Merge tag 'xfs-5.5-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Fix a few bugs that could lead to corrupt files, fsck complaints, and
filesystem crashes:
- Minor documentation fixes
- Fix a file corruption due to read racing with an insert range
operation.
- Fix log reservation overflows when allocating large rt extents
- Fix a buffer log item flags check
- Don't allow administrators to mount with sunit= options that will
cause later xfs_repair complaints about the root directory being
suspicious because the fs geometry appeared inconsistent
- Fix a non-static helper that should have been static"
* tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Make the symbol 'xfs_rtalloc_log_count' static
xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
xfs: split the sunit parameter update into two parts
xfs: refactor agfl length computation function
libxfs: resync with the userspace libxfs
xfs: use bitops interface for buf log item AIL flag check
xfs: fix log reservation overflows when allocating large rt extents
xfs: stabilize insert range start boundary to avoid COW writeback race
xfs: fix Sphinx documentation warning
Linus Torvalds [Sun, 22 Dec 2019 18:41:48 +0000 (10:41 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"Ext4 bug fixes, including a regression fix"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: clarify impact of 'commit' mount option
ext4: fix unused-but-set-variable warning in ext4_add_entry()
jbd2: fix kernel-doc notation warning
ext4: use RCU API in debug_print_tree
ext4: validate the debug_want_extra_isize mount option at parse time
ext4: reserve revoke credits in __ext4_new_inode
ext4: unlock on error in ext4_expand_extra_isize()
ext4: optimize __ext4_check_dir_entry()
ext4: check for directory entries too close to block end
ext4: fix ext4_empty_dir() for directories with holes
Linus Torvalds [Sun, 22 Dec 2019 18:36:55 +0000 (10:36 -0800)]
Merge tag 'block-5.5-
20191221' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Let's try this one again, this time without the compat_ioctl changes.
We've got those fixed up, but that can go out next week.
This contains:
- block queue flush lockdep annotation (Bart)
- Type fix for bsg_queue_rq() (Bart)
- Three dasd fixes (Stefan, Jan)
- nbd deadlock fix (Mike)
- Error handling bio user map fix (Yang)
- iocost fix (Tejun)
- sbitmap waitqueue addition fix that affects the kyber IO scheduler
(David)"
* tag 'block-5.5-
20191221' of git://git.kernel.dk/linux-block:
sbitmap: only queue kyber's wait callback if not already active
block: fix memleak when __blk_rq_map_user_iov() is failed
s390/dasd: fix typo in copyright statement
s390/dasd: fix memleak in path handling error case
s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
block: Fix a lockdep complaint triggered by request queue flushing
block: Fix the type of 'sts' in bsg_queue_rq()
block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
nbd: fix shutdown and recv work deadlock v2
iocost: over-budget forced IOs should schedule async delay
Linus Torvalds [Sun, 22 Dec 2019 18:26:59 +0000 (10:26 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"PPC:
- Fix a bug where we try to do an ultracall on a system without an
ultravisor
KVM:
- Fix uninitialised sysreg accessor
- Fix handling of demand-paged device mappings
- Stop spamming the console on IMPDEF sysregs
- Relax mappings of writable memslots
- Assorted cleanups
MIPS:
- Now orphan, James Hogan is stepping down
x86:
- MAINTAINERS change, so long Radim and thanks for all the fish
- supported CPUID fixes for AMD machines without SPEC_CTRL"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MAINTAINERS: remove Radim from KVM maintainers
MAINTAINERS: Orphan KVM for MIPS
kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
KVM: arm/arm64: Properly handle faulting of device mappings
KVM: arm64: Ensure 'params' is initialised when looking up sys register
KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region
KVM: arm64: Don't log IMP DEF sysreg traps
KVM: arm64: Sanely ratelimit sysreg messages
KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()
KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
Linus Torvalds [Sun, 22 Dec 2019 18:22:47 +0000 (10:22 -0800)]
Merge tag 'riscv/for-v5.5-rc3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Several fixes, and one cleanup, for RISC-V.
Fixes:
- Fix an error in a Kconfig file that resulted in an undefined
Kconfig option "CONFIG_CONFIG_MMU"
- Fix undefined Kconfig option "CONFIG_CONFIG_MMU"
- Fix scratch register clearing in M-mode (affects nommu users)
- Fix a mismerge on my part that broke the build for
CONFIG_SPARSEMEM_VMEMMAP users
Cleanup:
- Move SiFive L2 cache-related code to drivers/soc, per request"
* tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: move sifive_l2_cache.c to drivers/soc
riscv: define vmemmap before pfn_to_page calls
riscv: fix scratch register clearing in M-mode.
riscv: Fix use of undefined config option CONFIG_CONFIG_MMU
Linus Torvalds [Sun, 22 Dec 2019 17:54:33 +0000 (09:54 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
including adding a missing ipv6 match description.
2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
Bhat.
3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
Chaignon.
7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
8) Fix established socket lookup race when socket goes from
TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
RCU grace period. From Eric Dumazet.
9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
10) Fix active backup transition after link failure in bonding, from
Mahesh Bandewar.
11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
16) Reject invalid MTU values in stmmac, from Jose Abreu.
17) Fix refcount leak in error path of u32 classifier, from Davide
Caratti.
18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
Kaseorg.
19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
20) Disable hardware GRO when XDP is attached to qede, frm Manish
Chopra.
21) Since we encode state in the low pointer bits, dst metrics must be
at least 4 byte aligned, which is not necessarily true on m68k. Add
annotations to fix this, from Geert Uytterhoeven.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
sfc: Include XDP packet headroom in buffer step size.
sfc: fix channel allocation with brute force
net: dst: Force 4-byte alignment of dst_metrics
selftests: pmtu: fix init mtu value in description
hv_netvsc: Fix unwanted rx_table reset
net: phy: ensure that phy IDs are correctly typed
mod_devicetable: fix PHY module format
qede: Disable hardware gro when xdp prog is installed
net: ena: fix issues in setting interrupt moderation params in ethtool
net: ena: fix default tx interrupt moderation interval
net/smc: unregister ib devices in reboot_event
net: stmmac: platform: Fix MDIO init for platforms without PHY
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
net: dsa: ksz: use common define for tag len
s390/qeth: don't return -ENOTSUPP to userspace
s390/qeth: fix promiscuous mode after reset
s390/qeth: handle error due to unsupported transport mode
cxgb4: fix refcount init for TC-MQPRIO offload
tc-testing: initial tdc selftests for cls_u32
...
Jan Stancek [Sun, 22 Dec 2019 12:33:24 +0000 (13:33 +0100)]
pipe: fix empty pipe check in pipe_write()
LTP pipeio_1 test is hanging with
v5.5-rc2-385-gb8e382a185eb,
with read side observing empty pipe and sleeping and write
side running out of space and then sleeping as well. In this
scenario there are 5 writers and 1 reader.
Problem is that after pipe_write() reacquires pipe lock, it
re-checks for empty pipe with potentially stale 'head' and
doesn't wake up read side anymore. pipe->tail can advance
beyond 'head', because there are multiple writers.
Use pipe->head for empty pipe check after reacquiring lock
to observe current state.
Testing: With patch, LTP pipeio_1 ran successfully in loop for 1 hour.
Without patch it hanged within a minute.
Fixes: 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic")
Reported-by: Rachel Sibley <rasibley@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Sun, 22 Dec 2019 12:18:15 +0000 (13:18 +0100)]
Merge tag 'kvm-ppc-fixes-5.5-1' of git://git./linux/kernel/git/paulus/powerpc into kvm-master
PPC KVM fix for 5.5
- Fix a bug where we try to do an ultracall on a system without an
ultravisor.
Paolo Bonzini [Wed, 4 Dec 2019 14:33:35 +0000 (15:33 +0100)]
MAINTAINERS: remove Radim from KVM maintainers
Radim's kernel.org email is bouncing, which I take as a signal that
he is not really able to deal with KVM at this time. Make MAINTAINERS
match the effective value of KVM's bus factor.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
James Hogan [Sat, 21 Dec 2019 15:50:13 +0000 (15:50 +0000)]
MAINTAINERS: Orphan KVM for MIPS
I haven't been active for 18 months, and don't have the hardware set up
to test KVM for MIPS, so mark it as orphaned and remove myself as
maintainer. Hopefully somebody from MIPS can pick this up.
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan Kara [Wed, 18 Dec 2019 11:12:10 +0000 (12:12 +0100)]
ext4: clarify impact of 'commit' mount option
The description of 'commit' mount option dates back to ext3 times.
Update the description to match current meaning for ext4.
Reported-by: Paul Richards <paul.richards@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191218111210.14161-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Yunfeng Ye [Tue, 17 Dec 2019 14:46:49 +0000 (22:46 +0800)]
ext4: fix unused-but-set-variable warning in ext4_add_entry()
Warning is found when compile with "-Wunused-but-set-variable":
fs/ext4/namei.c: In function ‘ext4_add_entry’:
fs/ext4/namei.c:2167:23: warning: variable ‘sbi’ set but not used
[-Wunused-but-set-variable]
struct ext4_sb_info *sbi;
^~~
Fix this by moving the variable @sbi under CONFIG_UNICODE.
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/cb5eb904-224a-9701-c38f-cb23514b1fff@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 21 Dec 2019 23:16:56 +0000 (15:16 -0800)]
Merge tag 'trace-v5.5-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix memory leak on error path of process_system_preds()
- Lock inversion fix with updating tgid recording option
- Fix histogram compare function on big endian machines
- Fix histogram trigger function on big endian machines
- Make trace_printk() irq sync on init for kprobe selftest correctness
* tag 'trace-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix endianness bug in histogram trigger
samples/trace_printk: Wait for IRQ work to finish
tracing: Fix lock inversion in trace_event_enable_tgid_record()
tracing: Have the histogram compare functions convert to u64 first
tracing: Avoid memory leak in process_system_preds()
Linus Torvalds [Sat, 21 Dec 2019 23:12:26 +0000 (15:12 -0800)]
Merge tag 'libnvdimm-fix-5.5-rc3' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dan Williams:
"A minor regression fix.
The libnvdimm unit tests were expecting to mock calls to
ioremap_nocache() which disappeared in v5.5-rc1. This fix has appeared
in -next and collided with some cleanups that Christoph has planned
for v5.6, but he will fix up his branch once this goes in.
Summary:
- Restore the operation of the libnvdimm unit tests after the removal
of ioremap_nocache()"
* tag 'libnvdimm-fix-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Fix mock support for ioremap
Sven Schnelle [Wed, 18 Dec 2019 07:44:27 +0000 (08:44 +0100)]
tracing: Fix endianness bug in histogram trigger
At least on PA-RISC and s390 synthetic histogram triggers are failing
selftests because trace_event_raw_event_synth() always writes a 64 bit
values, but the reader expects a field->size sized value. On little endian
machines this doesn't hurt, but on big endian this makes the reader always
read zero values.
Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com
Cc: stable@vger.kernel.org
Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Sven Schnelle [Wed, 18 Dec 2019 07:44:26 +0000 (08:44 +0100)]
samples/trace_printk: Wait for IRQ work to finish
trace_printk schedules work via irq_work_queue(), but doesn't
wait until it was processed. The kprobe_module.tc testcase does:
:;: "Load module again, which means the event1 should be recorded";:
modprobe trace-printk
grep "event1:" trace
so the grep which checks the trace file might run before the irq work
was processed. Fix this by adding a irq_work_sync().
Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-3-svens@linux.ibm.com
Cc: stable@vger.kernel.org
Fixes: af2a0750f3749 ("selftests/ftrace: Improve kprobe on module testcase to load/unload module")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Prateek Sood [Tue, 10 Dec 2019 09:15:16 +0000 (09:15 +0000)]
tracing: Fix lock inversion in trace_event_enable_tgid_record()
Task T2 Task T3
trace_options_core_write() subsystem_open()
mutex_lock(trace_types_lock) mutex_lock(event_mutex)
set_tracer_flag()
trace_event_enable_tgid_record() mutex_lock(trace_types_lock)
mutex_lock(event_mutex)
This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.
Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com
Cc: stable@vger.kernel.org
Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sat, 21 Dec 2019 20:17:14 +0000 (12:17 -0800)]
Merge tag 's390-5.5-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix unwinding from irq context of interrupted user process.
- Add purgatory build missing symbols check. That helped to uncover and
fix missing symbols when built with kasan support enabled.
- Couple of ftrace fixes. Avoid broken stack trace and fix recursion
loop in function_graph tracer.
* tag 's390-5.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ftrace: save traced function caller
s390/unwind: stop gracefully at user mode pt_regs in irq stack
s390/purgatory: do not build purgatory with kcov, kasan and friends
s390/purgatory: Make sure we fail the build if purgatory has missing symbols
s390/ftrace: fix endless recursion in function_graph tracer
Linus Torvalds [Sat, 21 Dec 2019 18:52:10 +0000 (10:52 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: a (rare) PSI crash fix, a CPU affinity related balancing
fix, and a toning down of active migration attempts"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cfs: fix spurious active migration
sched/fair: Fix find_idlest_group() to handle CPU affinity
psi: Fix a division error in psi poll()
sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
Linus Torvalds [Sat, 21 Dec 2019 18:51:00 +0000 (10:51 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: a BTS fix, a PT NMI handling fix, a PMU sysfs fix and an
SRCU annotation"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Add SRCU annotation for pmus list walk
perf/x86/intel: Fix PT PMI handling
perf/x86/intel/bts: Fix the use of page_private()
perf/x86: Fix potential out-of-bounds access
Linus Torvalds [Sat, 21 Dec 2019 18:49:47 +0000 (10:49 -0800)]
Merge tag 'kbuild-fixes-v5.5' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix warning in out-of-tree 'make clean'
- add READELF variable to the top Makefile
- fix broken builds when LINUX_COMPILE_BY contains a backslash
- fix build warning in kallsyms
- fix NULL pointer access in expr_eq() in Kconfig
- fix missing dependency on rsync in deb-pkg build
- remove ---help--- from documentation
- fix misleading documentation about directory descending
* tag 'kbuild-fixes-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: clarify the difference between obj-y and obj-m w.r.t. descending
kconfig: remove ---help--- from documentation
scripts: package: mkdebian: add missing rsync dependency
kconfig: don't crash on NULL expressions in expr_eq()
scripts/kallsyms: fix offset overflow of kallsyms_relative_base
mkcompile_h: use printf for LINUX_COMPILE_BY
mkcompile_h: git rid of UTS_TRUNCATE from LINUX_COMPILE_{BY,HOST}
x86/boot: kbuild: allow readelf executable to be specified
kbuild: fix 'No such file or directory' warning when cleaning
Masahiro Yamada [Thu, 19 Dec 2019 11:51:00 +0000 (20:51 +0900)]
kbuild: clarify the difference between obj-y and obj-m w.r.t. descending
Kbuild descends into a directory by either 'y' or 'm', but there is an
important difference.
Kbuild combines the built-in objects into built-in.a in each directory.
The built-in.a in the directory visited by obj-y is merged into the
built-in.a in the parent directory. This merge happens recursively
when Kbuild is ascending back towards the top directory, then built-in
objects are linked into vmlinux eventually. This works properly only
when the Makefile specifying obj-y is reachable by the chain of obj-y.
On the other hand, Kbuild does not take built-in.a from the directory
visited by obj-m. This it, all the objects in that directory are
supposed to be modular. If Kbuild descends into a directory by obj-m,
but the Makefile in the sub-directory specifies obj-y, those objects
are just left orphan.
The current statement "Kbuild only uses this information to decide that
it needs to visit the directory" is misleading. Clarify the difference.
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Linus Torvalds [Sat, 21 Dec 2019 14:49:41 +0000 (06:49 -0800)]
Merge branch 'parisc-5.5-2' of git://git./linux/kernel/git/deller/parisc-linux
Pul parisc fixes from Helge Deller:
"Two build error fixes, one for the soft_offline_page() parameter
change and one for a specific KEXEC/KEXEC_FILE configuration, as well
as a compiler and a linker warning fix"
* 'parisc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix compiler warnings in debug_core.c
parisc: soft_offline_page() now takes the pfn
parisc: add missing __init annotation
parisc: fix compilation when KEXEC=n and KEXEC_FILE=y
Linus Torvalds [Sat, 21 Dec 2019 14:24:56 +0000 (06:24 -0800)]
Merge tag 'for-linus-5.5b-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two cleanup patches and a small series for supporting
reloading the Xen block backend driver"
* tag 'for-linus-5.5b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/grant-table: remove multiple BUG_ON on gnttab_interface
xen-blkback: support dynamic unbind/bind
xen/interface: re-define FRONT/BACK_RING_ATTACH()
xenbus: limit when state is forced to closed
xenbus: move xenbus_dev_shutdown() into frontend code...
xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
Linus Torvalds [Sat, 21 Dec 2019 14:17:05 +0000 (06:17 -0800)]
Merge tag 'powerpc-5.5-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Two weeks worth of accumulated fixes:
- A fix for a performance regression seen on PowerVM LPARs using
dedicated CPUs, caused by our vcpu_is_preempted() returning true
even for idle CPUs.
- One of the ultravisor support patches broke KVM on big endian hosts
in v5.4.
- Our KUAP (Kernel User Access Prevention) code missed allowing
access in __clear_user(), which could lead to an oops or erroneous
SEGV when triggered via PTRACE_GETREGSET.
- Two fixes for the ocxl driver, an open/remove race, and a memory
leak in an error path.
- A handful of other small fixes.
Thanks to: Andrew Donnellan, Christian Zigotzky, Christophe Leroy,
Christoph Hellwig, Daniel Axtens, David Hildenbrand, Frederic Barrat,
Gautham R. Shenoy, Greg Kurz, Ihor Pasichnyk, Juri Lelli, Marcus
Comstedt, Mike Rapoport, Parth Shah, Srikar Dronamraju, Vaidyanathan
Srinivasan"
* tag 'powerpc-5.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix regression on big endian hosts
powerpc: Fix __clear_user() with KUAP enabled
powerpc/pseries/cmm: fix managed page counts when migrating pages between zones
powerpc/8xx: fix bogus __init on mmu_mapin_ram_chunk()
ocxl: Fix potential memory leak on context creation
powerpc/irq: fix stack overflow verification
powerpc: Ensure that swiotlb buffer is allocated from low memory
powerpc/shared: Use static key to detect shared processor
powerpc/vcpu: Assume dedicated processors as non-preempt
ocxl: Fix concurrent AFU open and device removal
Linus Torvalds [Sat, 21 Dec 2019 14:04:12 +0000 (06:04 -0800)]
Merge branch 'ras-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 RAS fixes from Borislav Petkov:
"Three urgent RAS fixes for the AMD side of things:
- initialize struct mce.bank so that calculated error severity on AMD
SMCA machines is correct
- do not send IPIs early during bank initialization, when interrupts
are disabled
- a fix for when only a subset of MCA banks are enabled, which led to
boot hangs on some new AMD CPUs"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix possibly incorrect severity calculation on AMD
x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
Linus Torvalds [Sat, 21 Dec 2019 13:55:35 +0000 (05:55 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"One core framework fix to walk the orphan list and match up clks to
parents when clk providers register the DT provider after registering
all their clks (as they should).
Then a handful of driver fixes for the qcom, imx, and at91 drivers.
The driver fixes are relatively small fixes for incorrect register
settings or missing locks causing race conditions"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: Avoid SMMU/cx gdsc corner cases
clk: qcom: gcc-sc7180: Fix setting flag for votable GDSCs
clk: Move clk_core_reparent_orphans() under CONFIG_OF
clk: at91: fix possible deadlock
clk: walk orphan list on clock provider registration
clk: imx: pll14xx: fix clk_pll14xx_wait_lock
clk: imx: clk-imx7ulp: Add missing sentinel of ulp_div_table
clk: imx: clk-composite-8m: add lock to gate/mux
David S. Miller [Sat, 21 Dec 2019 05:56:48 +0000 (21:56 -0800)]
Merge branch 'sfc-fix-bugs-introduced-by-XDP-patches'
Edward Cree says:
====================
sfc: fix bugs introduced by XDP patches
Two fixes for bugs introduced by the XDP support in the sfc driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Charles McLachlan [Fri, 20 Dec 2019 16:27:10 +0000 (16:27 +0000)]
sfc: Include XDP packet headroom in buffer step size.
Correct a mismatch between rx_page_buf_step and the actual step size
used when filling buffer pages.
This patch fixes the page overrun that occured when the MTU was set to
anything bigger than 1692.
Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues")
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 20 Dec 2019 16:26:40 +0000 (16:26 +0000)]
sfc: fix channel allocation with brute force
It was possible for channel allocation logic to get confused between what
it had and what it wanted, and end up trying to use the same channel for
both PTP and regular TX. This led to a kernel panic:
BUG: unable to handle page fault for address:
0000000000047635
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0-rc3-ehc14+ #900
Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
RIP: 0010:native_queued_spin_lock_slowpath+0x188/0x1e0
Code: f3 90 48 8b 32 48 85 f6 74 f6 eb e8 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 c0 98 02 00 48 03 04 f5 a0 c6 ed 81 <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
RSP: 0018:
ffffc90000003d28 EFLAGS:
00010006
RAX:
0000000000047635 RBX:
0000000000000246 RCX:
0000000000040000
RDX:
ffff888627a298c0 RSI:
0000000000003ffe RDI:
ffff88861f6b8dd4
RBP:
ffff8886225c6e00 R08:
0000000000040000 R09:
0000000000000000
R10:
0000000616f080c6 R11:
00000000000000c0 R12:
ffff88861f6b8dd4
R13:
ffffc90000003dc8 R14:
ffff88861942bf00 R15:
ffff8886150f2000
FS:
0000000000000000(0000) GS:
ffff888627a00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000047635 CR3:
000000000200a000 CR4:
00000000000006f0
Call Trace:
<IRQ>
_raw_spin_lock_irqsave+0x22/0x30
skb_queue_tail+0x1b/0x50
sock_queue_err_skb+0x9d/0xf0
__skb_complete_tx_timestamp+0x9d/0xc0
efx_dequeue_buffer+0x126/0x180 [sfc]
efx_xmit_done+0x73/0x1c0 [sfc]
efx_ef10_ev_process+0x56a/0xfe0 [sfc]
? tick_sched_do_timer+0x60/0x60
? timerqueue_add+0x5d/0x70
? enqueue_hrtimer+0x39/0x90
efx_poll+0x111/0x380 [sfc]
? rcu_accelerate_cbs+0x50/0x160
net_rx_action+0x14a/0x400
__do_softirq+0xdd/0x2d0
irq_exit+0xa0/0xb0
do_IRQ+0x53/0xe0
common_interrupt+0xf/0xf
</IRQ>
In the long run we intend to rewrite the channel allocation code, but for
'net' fix this by allocating extra_channels, and giving them TX queues,
even if we do not in fact need them (e.g. on NICs without MAC TX
timestamping), and thereby using simpler logic to assign the channels
once they're allocated.
Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Dec 2019 05:55:00 +0000 (21:55 -0800)]
Merge tag 'wireless-drivers-next-2019-12-20' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.6
First set of patches for v5.6. The biggest thing here is of course the
new driver ath11k but also new features for other drivers as well a
myriad of bug fixes.
Major changes:
ath11k
* a new driver for Qualcomm Wi-Fi 6 (IEEE 802.11ax) devices
ath10k
* significant improvements on receive throughput and firmware download
with SDIO bus
* report signal strength for each chain also on SDIO
* set max mtu to 1500 on SDIO devices
brcmfmac
* add support for BCM4359 SDIO chipset
wil6210
* support set_multicast_to_unicast cfg80211 operation
* support set_cqm_rssi_config cfg80211 operation
wcn36xx
* disable HW_CONNECTION_MONITOR as firmware is buggy
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Fri, 20 Dec 2019 13:31:40 +0000 (14:31 +0100)]
net: dst: Force 4-byte alignment of dst_metrics
When storing a pointer to a dst_metrics structure in dst_entry._metrics,
two flags are added in the least significant bits of the pointer value.
Hence this assumes all pointers to dst_metrics structures have at least
4-byte alignment.
However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not
4 bytes. Hence in some kernel builds, dst_default_metrics may be only
2-byte aligned, leading to obscure boot warnings like:
WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W
5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261
Stack from
10835e6c:
10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea
00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000
04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c
00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001
003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84
003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a
Call Trace: [<
00023fa6>] __warn+0xb2/0xb4
[<
00023fea>] warn_slowpath_fmt+0x42/0x64
[<
001a70f8>] refcount_warn_saturate+0x44/0x9a
[<
00043aa8>] printk+0x0/0x18
[<
001a70f8>] refcount_warn_saturate+0x44/0x9a
[<
0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e
[<
0026d5a8>] ipv4_dst_destroy+0x5e/0x7e
[<
00025e84>] __local_bh_enable_ip+0x0/0x8e
[<
002416a8>] dst_destroy+0x40/0xae
Fix this by forcing 4-byte alignment of all dst_metrics structures.
Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 20 Dec 2019 07:08:06 +0000 (15:08 +0800)]
selftests: pmtu: fix init mtu value in description
There is no a_r3, a_r4 in the testing topology.
It should be b_r1, b_r2. Also b_r1 mtu is 1400 and b_r2 mtu is 1500.
Fixes: e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 20 Dec 2019 02:28:10 +0000 (18:28 -0800)]
hv_netvsc: Fix unwanted rx_table reset
In existing code, the receive indirection table, rx_table, is in
struct rndis_device, which will be reset when changing MTU, ringparam,
etc. User configured receive indirection table values will be lost.
To fix this, move rx_table to struct net_device_context, and check
netif_is_rxfh_configured(), so rx_table will be set to default only
if no user configured value.
Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 19 Dec 2019 23:24:52 +0000 (23:24 +0000)]
net: phy: ensure that phy IDs are correctly typed
PHY IDs are 32-bit unsigned quantities. Ensure that they are always
treated as such, and not passed around as "int"s.
Fixes: 13d0ab6750b2 ("net: phy: check return code when requesting PHY driver module")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 19 Dec 2019 23:24:47 +0000 (23:24 +0000)]
mod_devicetable: fix PHY module format
When a PHY is probed, if the top bit is set, we end up requesting a
module with the string "mdio:-
10101110000000100101000101010001" -
the top bit is printed to a signed -1 value. This leads to the module
not being loaded.
Fix the module format string and the macro generating the values for
it to ensure that we only print unsigned types and the top bit is
always 0/1. We correctly end up with
"mdio:
10101110000000100101000101010001".
Fixes: 8626d3b43280 ("phylib: Support phy module autoloading")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Thu, 19 Dec 2019 18:35:16 +0000 (10:35 -0800)]
qede: Disable hardware gro when xdp prog is installed
commit
18c602dee472 ("qede: Use NETIF_F_GRO_HW.") introduced
a regression in driver that when xdp program is installed on
qede device, device's aggregation feature (hardware GRO) is not
getting disabled, which is unexpected with xdp.
Fixes: 18c602dee472 ("qede: Use NETIF_F_GRO_HW.")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Dec 2019 05:43:09 +0000 (21:43 -0800)]
Merge branch 'ena-fixes-of-interrupt-moderation-bugs'
Arthur Kiyanovski says:
====================
ena: fixes of interrupt moderation bugs
Differences from V1:
1. Updated default tx interrupt moderation to 64us
2. Added "Fixes:" tags.
3. Removed cosmetic changes that are not relevant for these bug fixes
This patchset includes a couple of fixes of bugs in the implemenation of
interrupt moderation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Thu, 19 Dec 2019 15:40:56 +0000 (17:40 +0200)]
net: ena: fix issues in setting interrupt moderation params in ethtool
Issue 1:
--------
Reproduction steps:
1. sudo ethtool -C eth0 rx-usecs 128
2. sudo ethtool -C eth0 adaptive-rx on
3. sudo ethtool -C eth0 adaptive-rx off
4. ethtool -c eth0
expected output: rx-usecs 128
actual output: rx-usecs 0
Reason for issue:
In stage 3, ethtool userspace calls first the ena_get_coalesce() handler
to get the current value of all properties, and then the ena_set_coalesce()
handler. When ena_get_coalesce() is called the adaptive interrupt
moderation is still on. There is an if in the code that returns the
rx_coalesce_usecs only if the adaptive interrupt moderation is off.
And since it is still on, rx_coalesce_usecs is not set, meaning it
stays 0.
Solution to issue:
Remove this if static interrupt moderation intervals have nothing to do
with dynamic ones.
Issue 2:
--------
Reproduction steps:
1. sudo ethtool -C eth0 adaptive-rx on
2. sudo ethtool -C eth0 rx-usecs 128
3. ethtool -c eth0
expected output: rx-usecs 128
actual output: rx-usecs 0
Reason for issue:
In stage 2, when ena_set_coalesce() is called, the handler tests if
rx adaptive interrupt moderation is on, and if it is, it returns before
getting to the part in the function that sets the rx non-adaptive
interrupt moderation interval.
Solution to issue:
Remove the return from the function when rx adaptive interrupt moderation
is on.
Also cleaned up the fixed code in ena_set_coalesce by grouping together
adaptive interrupt moderation toggling, and using && instead of nested
ifs.
Fixes: b3db86dc4b82 ("net: ena: reimplement set/get_coalesce()")
Fixes: 0eda847953d8 ("net: ena: fix retrieval of nonadaptive interrupt moderation intervals")
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Thu, 19 Dec 2019 15:40:55 +0000 (17:40 +0200)]
net: ena: fix default tx interrupt moderation interval
Current default non-adaptive tx interrupt moderation interval is 196 us.
This value is too high and might cause the tx queue to fill up.
In this commit we set the default non-adaptive tx interrupt moderation
interval to 64 us in order to:
1. Reduce the probability of the queue filling-up (when compared to the
current default value of 196 us).
2. Reduce unnecessary tx interrupt overhead (which happens if we set the
default tx interval to 0).
We determined experimentally that 64 us is an optimal value that
reduces interrupt rate by more than 20% without affecting performance.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 19 Dec 2019 11:51:13 +0000 (12:51 +0100)]
net/smc: unregister ib devices in reboot_event
In the reboot_event handler, unregister the ib devices and enable
the IB layer to release the devices before the reboot.
Fixes: a33a803cfe64 ("net/smc: guarantee removal of link groups in reboot")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Padmanabhan Rajanbabu [Thu, 19 Dec 2019 10:17:01 +0000 (15:47 +0530)]
net: stmmac: platform: Fix MDIO init for platforms without PHY
The current implementation of "stmmac_dt_phy" function initializes
the MDIO platform bus data, even in the absence of PHY. This fix
will skip MDIO initialization if there is no PHY present.
Fixes: 7437127 ("net: stmmac: Convert to phylink and remove phylib logic")
Acked-by: Jayati Sahu <jayati.sahu@samsung.com>
Signed-off-by: Sriram Dash <sriram.dash@samsung.com>
Signed-off-by: Padmanabhan Rajanbabu <p.rajanbabu@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Dec 2019 05:20:39 +0000 (21:20 -0800)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: misc updates for -net-next
This series includes some misc updates for the HNS3 ethernet driver.
[patch 1] adds FE bit check before calling hns3_add_frag().
[patch 2] removes an unnecessary lock.
[patch 3] adds a little optimization for CMDQ uninitialization.
[patch 4] refactors the dump of FD tcams.
[patch 5] implements ndo_features_check ops.
[patch 6] adds some VF VLAN information for command "ip link show".
[patch 7] adds a debug print.
[patch 8] modifies the timing of print misc interrupt status when
handling hardware error event.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 19 Dec 2019 06:57:47 +0000 (14:57 +0800)]
net: hns3: only print misc interrupt status when handling fails
Printing misc interrupt status of hardware error event in the
IRQ handler is unnecessary, since hclge_handle_hw_msix_error()
will print out the detail information for this hardware error
when handling success. So, this patch removes the print in
IRQ handler, and prints it when hclge_handle_hw_msix_error()
fails.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 19 Dec 2019 06:57:46 +0000 (14:57 +0800)]
net: hns3: add a log for getting chain failure in hns3_nic_uninit_vector_data()
Since the mapping can be overwritten, when fail to get
the chain between vector and ring, we should go on to
deal with the remaining options. For debugging, this
patch adds log info for this failure.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 19 Dec 2019 06:57:45 +0000 (14:57 +0800)]
net: hns3: add some VF VLAN information for command "ip link show"
This patch adds some VF VLAN information for command "ip link show".
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 19 Dec 2019 06:57:44 +0000 (14:57 +0800)]
net: hns3: implement ndo_features_check ops for hns3 driver
The function netif_skb_features() will disable the TSO feature
by using dflt_features_check() if the driver does not implement
ndo_features_check ops, which may cause performance degradation
problem when hns3 hardware can do multiple tagged TSO.
Also, the HNS3 hardware only supports checksum on the SKB with
a max header len of 480 bytes, so remove the checksum and TSO
related features when the header len is over 480 bytes.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 19 Dec 2019 06:57:43 +0000 (14:57 +0800)]
net: hns3: get FD rules location before dump in debugfs
Currently, the dump FD tcam mode in debugfs is to query all FD tcams,
including empty rules, which is unnecessary. This patch modify to
find the position of useful rules before dump FD tcam, so that it does
not need to query empty rules.
This patch also modifies some help information in debugfs.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 19 Dec 2019 06:57:42 +0000 (14:57 +0800)]
net: hns3: optimization for CMDQ uninitialization
When uninitializing CMDQ, HCLGE_STATE_CMD_DISABLE will
be set up firstly, then the driver does not send command
anymore. So, hclge_free_cmd_desc can be called without
holding ring->lock. hclge_destroy_cmd_queue() and
hclge_destroy_queue() are unnecessary now, so removes them,
the VF driver has implemented currently.
BTW, the VF driver should set up HCLGEVF_STATE_CMD_DISABLE
as well in the hclgevf_cmd_uninit(), just likes what the PF
driver does.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 19 Dec 2019 06:57:41 +0000 (14:57 +0800)]
net: hns3: remove useless mutex vport_cfg_mutex in the struct hclge_dev
Mutex vport_cfg_mutex has been used to protect uc_mac_list,
mc_mac_list and vlan_list from being modified by unloading
or reset task at the same time. But now unloading will
set up HCLGE_STATE_REMOVING flag and call cancel_work_sync to
break down this race condition, so this mutex is unnecessary.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 19 Dec 2019 06:57:40 +0000 (14:57 +0800)]
net: hns3: check FE bit before calling hns3_add_frag()
A BD with FE bit means that it is the last BD of a packet,
currently the FE bit is checked before calling hns3_add_frag(),
which is unnecessary because the FE bit may have been checked
in some case.
This patch checks the FE bit before calling hns3_add_frag()
after processing the first BD of a SKB and adjust the location
of memcpy() to reduce duplication.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chan Shu Tak, Alex [Thu, 19 Dec 2019 06:16:18 +0000 (14:16 +0800)]
llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
When a frame with NULL DSAP is received, llc_station_rcv is called.
In turn, llc_stat_ev_rx_null_dsap_xid_c is called to check if it is a NULL
XID frame. The return statement of llc_stat_ev_rx_null_dsap_xid_c returns 1
when the incoming frame is not a NULL XID frame and 0 otherwise. Hence, a
NULL XID response is returned unexpectedly, e.g. when the incoming frame is
a NULL TEST command.
To fix the error, simply remove the conditional operator.
A similar error in llc_stat_ev_rx_null_dsap_test_c is also fixed.
Signed-off-by: Chan Shu Tak, Alex <alexchan@task.com.hk>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Rutherford [Thu, 19 Dec 2019 05:03:57 +0000 (16:03 +1100)]
tipc: make legacy address flag readable over netlink
To enable iproute2/tipc to generate backwards compatible
printouts and validate command parameters for nodes using a
<z.c.n> node address, it needs to be able to read the legacy
address flag from the kernel. The legacy address flag records
the way in which the node identity was originally specified.
The legacy address flag is requested by the netlink message
TIPC_NL_ADDR_LEGACY_GET. If the flag is set the attribute
TIPC_NLA_NET_ADDR_LEGACY is set in the return message.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiangfeng Xiao [Thu, 19 Dec 2019 02:08:07 +0000 (10:08 +0800)]
net: hisilicon: Fix a BUG trigered by wrong bytes_compl
When doing stress test, we get the following trace:
kernel BUG at lib/dynamic_queue_limits.c:26!
Internal error: Oops - BUG: 0 [#1] SMP ARM
Modules linked in: hip04_eth
CPU: 0 PID: 2003 Comm: tDblStackPcap0 Tainted: G O L 4.4.197 #1
Hardware name: Hisilicon A15
task:
c3637668 task.stack:
de3bc000
PC is at dql_completed+0x18/0x154
LR is at hip04_tx_reclaim+0x110/0x174 [hip04_eth]
pc : [<
c041abfc>] lr : [<
bf0003a8>] psr:
800f0313
sp :
de3bdc2c ip :
00000000 fp :
c020fb10
r10:
00000000 r9 :
c39b4224 r8 :
00000001
r7 :
00000046 r6 :
c39b4000 r5 :
0078f392 r4 :
0078f392
r3 :
00000047 r2 :
00000000 r1 :
00000046 r0 :
df5d5c80
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control:
32c5387d Table:
1e189b80 DAC:
55555555
Process tDblStackPcap0 (pid: 2003, stack limit = 0xde3bc190)
Stack: (0xde3bdc2c to 0xde3be000)
[<
c041abfc>] (dql_completed) from [<
bf0003a8>] (hip04_tx_reclaim+0x110/0x174 [hip04_eth])
[<
bf0003a8>] (hip04_tx_reclaim [hip04_eth]) from [<
bf0012c0>] (hip04_rx_poll+0x20/0x388 [hip04_eth])
[<
bf0012c0>] (hip04_rx_poll [hip04_eth]) from [<
c04c8d9c>] (net_rx_action+0x120/0x374)
[<
c04c8d9c>] (net_rx_action) from [<
c021eaf4>] (__do_softirq+0x218/0x318)
[<
c021eaf4>] (__do_softirq) from [<
c021eea0>] (irq_exit+0x88/0xac)
[<
c021eea0>] (irq_exit) from [<
c0240130>] (msa_irq_exit+0x11c/0x1d4)
[<
c0240130>] (msa_irq_exit) from [<
c0267ba8>] (__handle_domain_irq+0x110/0x148)
[<
c0267ba8>] (__handle_domain_irq) from [<
c0201588>] (gic_handle_irq+0xd4/0x118)
[<
c0201588>] (gic_handle_irq) from [<
c0558360>] (__irq_svc+0x40/0x58)
Exception stack(0xde3bdde0 to 0xde3bde28)
dde0:
00000000 00008001 c3637668 00000000 00000000 a00f0213 dd3627a0 c0af6380
de00:
c086d380 a00f0213 c0a22a50 de3bde6c 00000002 de3bde30 c0558138 c055813c
de20:
600f0213 ffffffff
[<
c0558360>] (__irq_svc) from [<
c055813c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
Kernel panic - not syncing: Fatal exception in interrupt
Pre-modification code:
int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
[...]
[1] priv->tx_head = TX_NEXT(tx_head);
[2] count++;
[3] netdev_sent_queue(ndev, skb->len);
[...]
}
An rx interrupt occurs if hip04_mac_start_xmit just executes to the line 2,
tx_head has been updated, but corresponding 'skb->len' has not been
added to dql_queue.
And then
hip04_mac_interrupt->__napi_schedule->hip04_rx_poll->hip04_tx_reclaim
In hip04_tx_reclaim, because tx_head has been updated,
bytes_compl will plus an additional "skb-> len"
which has not been added to dql_queue. And then
trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
To solve the problem described above, we put
"netdev_sent_queue(ndev, skb->len);"
before
"priv->tx_head = TX_NEXT(tx_head);"
Fixes: a41ea46a9a12 ("net: hisilicon: new hip04 ethernet driver")
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Dec 2019 05:09:21 +0000 (21:09 -0800)]
Merge branch 'VSOCK-add-vsock_test-test-suite'
Stefano Garzarella says:
====================
VSOCK: add vsock_test test suite
The vsock_diag.ko module already has a test suite but the core AF_VSOCK
functionality has no tests. This patch series adds several test cases that
exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
half-closed connections, simultaneous connections).
The v1 of this series was originally sent by Stefan.
v3:
- Patch 6:
* check the byte received in the recv_byte()
* use send(2)/recv(2) instead of write(2)/read(2) to test also flags
(e.g. MSG_PEEK)
- Patch 8:
* removed unnecessary control_expectln("CLOSED") [Stefan].
- removed patches 9,10,11 added in the v2
- new Patch 9 add parameters to list and skip tests (e.g. useful for vmci
that doesn't support half-closed socket in the host)
- new Patch 10 prints a list of options in the help
- new Patch 11 tests MSG_PEEK flags of recv(2)
v2: https://patchwork.ozlabs.org/cover/
1140538/
v1: https://patchwork.ozlabs.org/cover/847998/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 18 Dec 2019 18:07:08 +0000 (19:07 +0100)]
vsock_test: add SOCK_STREAM MSG_PEEK test
Test if the MSG_PEEK flags of recv(2) works as expected.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 18 Dec 2019 18:07:07 +0000 (19:07 +0100)]
testing/vsock: print list of options and description
Since we now have several options, in the help we print the list
of all supported options and a brief description of them.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 18 Dec 2019 18:07:06 +0000 (19:07 +0100)]
testing/vsock: add parameters to list and skip tests
Some tests can fail with transports that have a slightly
different behavior, so let's add the possibility to specify
which tests to skip.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 18 Dec 2019 18:07:05 +0000 (19:07 +0100)]
vsock_test: wait for the remote to close the connection
Before check if a send returns -EPIPE, we need to make sure the
connection is closed.
To do that, we use epoll API to wait EPOLLRDHUP or EPOLLHUP events
on the socket.
Reported-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:07:04 +0000 (19:07 +0100)]
VSOCK: add AF_VSOCK test cases
The vsock_test.c program runs a test suite of AF_VSOCK test cases.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:07:03 +0000 (19:07 +0100)]
VSOCK: add send_byte()/recv_byte() test utilities
Test cases will want to transfer data. This patch adds utility
functions to do this.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:07:02 +0000 (19:07 +0100)]
VSOCK: add full barrier between test cases
See code comment for details.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:07:01 +0000 (19:07 +0100)]
VSOCK: extract connect/accept functions from vsock_diag_test.c
Many test cases will need to connect to the server or accept incoming
connections. This patch extracts these operations into utility
functions that can be reused.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:07:00 +0000 (19:07 +0100)]
VSOCK: extract utility functions from vsock_diag_test.c
Move useful functions into a separate file in preparation for more
vsock test programs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:06:59 +0000 (19:06 +0100)]
VSOCK: add SPDX identifiers to vsock tests
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Wed, 18 Dec 2019 18:06:58 +0000 (19:06 +0100)]
VSOCK: fix header include in vsock_diag_test
The vsock_diag_test program directly included ../../../include/uapi/
headers from the source tree. Tests are supposed to use the
usr/include/linux/ headers that have been prepared with make
headers_install instead.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Grzeschik [Wed, 18 Dec 2019 16:01:39 +0000 (17:01 +0100)]
net: dsa: ksz: use common define for tag len
Remove special taglen define KSZ8795_INGRESS_TAG_LEN
and use generic KSZ_INGRESS_TAG_LEN instead.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>