YueHaibing [Tue, 18 Sep 2018 06:19:05 +0000 (14:19 +0800)]
net: cavium: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 06:09:43 +0000 (14:09 +0800)]
net: hns3: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, also the implementation in this
driver has returns 'netdev_tx_t' value, so just change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 03:09:14 +0000 (11:09 +0800)]
net: ethernet: slicoss: remove duplicated include from slic.h
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 17 Sep 2018 10:46:55 +0000 (18:46 +0800)]
veth: rename pcpu_vstats as pcpu_lstats
struct pcpu_vstats and pcpu_lstats have same members and
usage, and pcpu_lstats is used in many files, so rename
pcpu_vstats as pcpu_lstats to reduce duplicate definition
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 17 Sep 2018 09:57:29 +0000 (11:57 +0200)]
netlink: add ethernet address policy types
Commonly, ethernet addresses are just using a policy of
{ .len = ETH_ALEN }
which leaves userspace free to send more data than it should,
which may hide bugs.
Introduce NLA_EXACT_LEN which checks for exact size, rejecting
the attribute if it's not exactly that length. Also add
NLA_EXACT_LEN_WARN which requires the minimum length and will
warn on longer attributes, for backward compatibility.
Use these to define NLA_POLICY_ETH_ADDR (new strict policy) and
NLA_POLICY_ETH_ADDR_COMPAT (compatible policy with warning);
these are used like this:
static const struct nla_policy <name>[...] = {
[NL_ATTR_NAME] = NLA_POLICY_ETH_ADDR,
...
};
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 17 Sep 2018 09:57:28 +0000 (11:57 +0200)]
netlink: add NLA_REJECT policy type
In some situations some netlink attributes may be used for output
only (kernel->userspace) or may be reserved for future use. It's
then helpful to be able to prevent userspace from using them in
messages sent to the kernel, since they'd otherwise be ignored and
any future will become impossible if this happens.
Add NLA_REJECT to the policy which does nothing but reject (with
EINVAL) validation of any messages containing this attribute.
Allow for returning a specific extended ACK error message in the
validation_data pointer.
While at it clear up the documentation a bit - the NLA_BITFIELD32
documentation was added to the list of len field descriptions.
Also, use NL_SET_BAD_ATTR() in one place where it's open-coded.
The specific case I have in mind now is a shared nested attribute
containing request/response data, and it would be pointless and
potentially confusing to have userspace include response data in
the messages that actually contain a request.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Sep 2018 02:27:40 +0000 (19:27 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-09-18
This series contains changes to i40evf so that it becomes a more
generic virtual function driver for current and future silicon.
While doing the rename of i40evf to a more generic name of iavf,
we also put the driver on a severe diet due to how much of the
code was unneeded or was unused. The outcome is a lean and mean
virtual function driver that continues to work on existing 40GbE
(i40e) virtual devices and prepped for future supported devices,
like the 100GbE (ice) virtual devices.
This solves 2 issues we saw coming or were already present, the
first was constant code duplication happening with i40e/i40evf,
when much of the duplicate code in the i40evf was not used or was
not needed. The second was to remove the future confusion of why
future VF devices that were not considered "40GbE" only devices
were supported by i40evf.
The thought is that iavf will be the virtual function driver for
all future devices, so it should have a "generic" name to properly
represent that it is the VF driver for multiple generations of
devices.
The last patch in this series is unreleated to the iavf conversion
and just has to do with a MODULE_LICENSE correction.
Known Caveats:
Existing user space configurations may have to change, but the module
alias in patch 1 helps a bit here.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:57 +0000 (17:37 -0700)]
intel-ethernet: use correct module license
We recently updated all our SPDX identifiers to correctly
indicate our net/ethernet/intel/* drivers were always released
and intended to be released under GPL v2, but the MODULE_LICENSE
declaration was never updated.
Fix the MODULE_LICENSE to be GPL v2, for all our drivers.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:56 +0000 (17:37 -0700)]
iavf: finish renaming files to iavf
This finishes the process of renaming the files that
make sense to rename (skipping adminq related files that
talk to i40e), and fixes up the build and the #includes
so that everything builds nicely.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:55 +0000 (17:37 -0700)]
iavf: rename most of i40e strings
This is the big rename patch, it takes most of the i40e_
and I40E_ strings and renames them to iavf_ and IAVF_.
Some of the adminq code, as well as most of the client
interface code used by RDMA is left unchanged in order
to indicate that the driver is talking to non-internal to
iavf code.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:54 +0000 (17:37 -0700)]
iavf: tracing infrastructure rename
Rename the i40e_trace file and fix up all the callers
to the new names inside the iavf_trace.h file.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:53 +0000 (17:37 -0700)]
iavf: replace i40e_debug with iavf version
Change another string (i40e_debug)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:52 +0000 (17:37 -0700)]
iavf: rename i40e_hw to iavf_hw
Fix up the i40e_hw names to new name, including versions
inside other strings.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:51 +0000 (17:37 -0700)]
iavf: rename I40E_ADMINQ_DESC
Take care of some renames containing I40E_ADMINQ_DESC.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:50 +0000 (17:37 -0700)]
iavf: rename device ID defines
Rename the device ID defines to have IAVF in them
and remove all the unused defines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:49 +0000 (17:37 -0700)]
iavf: remove references to old names
Remove the register name references to I40E_VF* and change to
IAVF_VF. Update the descriptor names and defines to the IAVF
name.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:48 +0000 (17:37 -0700)]
iavf: move i40evf files to new name
Simply move the i40evf files to the new name, updating the #includes
to track the new names, and updating the Makefile as well.
A future patch will remove the i40e references (after the code
removal patches later in this series).
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:47 +0000 (17:37 -0700)]
iavf: rename i40e_status to iavf_status
This is just a rename of an internal variable i40e_status, but
it was a pretty big change and so deserved it's own patch.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:46 +0000 (17:37 -0700)]
iavf: rename functions and structs to new name
This basically begins the internal portion of the rename of i40evf to iavf,
by renaming many of the functions, structs, variables and defines.
Most of the changes were made mechanically, which introduces some
alignment issues.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:45 +0000 (17:37 -0700)]
iavf: diet and reformat
Remove a bunch of unused code and reformat a few lines. Also
remove some now un-necessary files.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 18 Sep 2018 16:33:27 +0000 (09:33 -0700)]
Merge ra./pub/scm/linux/kernel/git/davem/net
Two new tls tests added in parallel in both net and net-next.
Used Stephen Rothwell's linux-next resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Sat, 15 Sep 2018 00:37:44 +0000 (17:37 -0700)]
intel-ethernet: rename i40evf to iavf
Rename the Intel Ethernet Adaptive Virtual Function driver
(i40evf) to a new name (iavf) that is more consistent with
the ongoing maintenance of the driver as the universal VF driver
for multiple product lines.
This first patch fixes up the directory names and the .ko name,
intentionally ignoring the function names inside the driver
for now. Basically this is the simplest patch that gets
the rename done and will be followed by other patches that
rename the internal functions.
This patch also addresses a couple of string/name issues
and updates the Copyright year.
Also, made sure to add a MODULE_ALIAS to the old name.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Kroah-Hartman [Tue, 18 Sep 2018 07:31:53 +0000 (09:31 +0200)]
Merge gitolite./pub/scm/linux/kernel/git/davem/net
Dave writes:
"Various fixes, all over the place:
1) OOB data generation fix in bluetooth, from Matias Karhumaa.
2) BPF BTF boundary calculation fix, from Martin KaFai Lau.
3) Don't bug on excessive frags, to be compatible in situations mixing
older and newer kernels on each end. From Juergen Gross.
4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.
5) Zero keying information in TLS layer before freeing copies
of them, from Sabrina Dubroca.
6) Fix NULL deref in act_sample, from Davide Caratti.
7) Orphan SKB before GRO in veth to prevent crashes with XDP,
from Toshiaki Makita.
8) Fix use after free in ip6_xmit, from Eric Dumazet.
9) Fix VF mac address regression in bnxt_en, from Micahel Chan.
10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.
11) Programming adjustments to r8169 which fix not being to enter deep
sleep states on some machines, from Kai-Heng Feng and Hans de
Goede.
12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
Oskolkov."
* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
net/ipv6: do not copy dst flags on rt init
qmi_wwan: set DTR for modems in forced USB2 mode
clk: x86: Stop marking clocks as CLK_IS_CRITICAL
r8169: Get and enable optional ether_clk clock
clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
r8169: enable ASPM on RTL8106E
r8169: Align ASPM/CLKREQ setting function with vendor driver
Revert "kcm: remove any offset before parsing messages"
kcm: remove any offset before parsing messages
net: ethernet: Fix a unused function warning.
net: dsa: mv88e6xxx: Fix ATU Miss Violation
tls: fix currently broken MSG_PEEK behavior
hv_netvsc: pair VF based on serial number
PCI: hv: support reporting serial number as slot information
bnxt_en: Fix VF mac address regression.
ipv6: fix possible use-after-free in ip6_xmit()
net: hp100: fix always-true check for link up state
ARM: dts: at91: add new compatibility string for macb on sama5d3
net: macb: disable scatter-gather for macb on sama5d3
net: mvpp2: let phylink manage the carrier state
...
YueHaibing [Tue, 18 Sep 2018 02:48:41 +0000 (10:48 +0800)]
net: mdio: remove duplicated include from mdio_bus.c
Remove duplicated include linux/gpio/consumer.h
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 02:46:27 +0000 (10:46 +0800)]
qed: remove duplicated include from qed_cxt.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 02:43:43 +0000 (10:43 +0800)]
liquidio: remove duplicated include from lio_vf_rep.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 02:41:28 +0000 (10:41 +0800)]
cxgb4: remove duplicated include from cxgb4_main.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Oskolkov [Mon, 17 Sep 2018 17:20:53 +0000 (10:20 -0700)]
net/ipv6: do not copy dst flags on rt init
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
in dist_init() and decremented in dst_destroy().
This flag is tied to allocation/deallocation of dst_entry and
should not be copied from another dst/route. Otherwise it can happen
that dst_ops::pcpuc_entries counter grows until no new routes can
be allocated because the counter reached ip6_rt_max_size due to
DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries")
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 18 Sep 2018 02:17:18 +0000 (10:17 +0800)]
gianfar: remove duplicated include from gianfar.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Nuernberger [Mon, 17 Sep 2018 17:46:53 +0000 (19:46 +0200)]
net/ipv4: defensive cipso option parsing
commit
40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.
Signed-off-by: Stefan Nuernberger <snu@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Simon Veith <sveith@amazon.de>
Cc: stable@vger.kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Mon, 17 Sep 2018 20:00:24 +0000 (22:00 +0200)]
qmi_wwan: set DTR for modems in forced USB2 mode
Recent firmware revisions have added the ability to force
these modems to USB2 mode, hiding their SuperSpeed
capabilities from the host. The driver has been using the
SuperSpeed capability, as shown by the bcdUSB field of the
device descriptor, to detect the need to enable the DTR
quirk. This method fails when the modems are forced to
USB2 mode by the modem firmware.
Fix by unconditionally enabling the DTR quirk for the
affected device IDs.
Reported-by: Fred Veldini <fred.veldini@gmail.com>
Reported-by: Deshu Wen <dwen@sierrawireless.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Fred Veldini <fred.veldini@gmail.com>
Reported-by: Deshu Wen <dwen@sierrawireless.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Sat, 15 Sep 2018 04:02:46 +0000 (12:02 +0800)]
net: socionext: Fix two sleep-in-atomic-context bugs in ave_rxfifo_reset()
The driver may sleep with holding a spinlock.
The function call paths (from bottom to top) in Linux-4.17 are:
[FUNC] usleep_range
drivers/net/ethernet/socionext/sni_ave.c, 892:
usleep_range in ave_rxfifo_reset
drivers/net/ethernet/socionext/sni_ave.c, 932:
ave_rxfifo_reset in ave_irq_handler
[FUNC] usleep_range
drivers/net/ethernet/socionext/sni_ave.c, 888:
usleep_range in ave_rxfifo_reset
drivers/net/ethernet/socionext/sni_ave.c, 932:
ave_rxfifo_reset in ave_irq_handler
To fix these bugs, usleep_range() is replaced with udelay().
These bugs are found by my static analysis tool DSAC.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 14 Sep 2018 17:19:16 +0000 (18:19 +0100)]
net: caif: remove redundant null check on frontpkt
It is impossible for frontpkt to be null at the point of the null
check because it has been assigned from rearpkt and there is no
way rearpkt can be null at the point of the assignment because
of the sanity checking and exit paths taken previously. Remove
the redundant null check.
Detected by CoverityScan, CID#114434 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Sep 2018 01:47:58 +0000 (18:47 -0700)]
Merge branch 'r8169-clk-fixes'
Hans de Goede says:
====================
r8169 (x86) clk fixes to fix S0ix not being reached
This series adds code to the r8169 ethernet driver to get and enable an
external clock if present, avoiding the need for a hack in the
clk-pmc-atom driver where that clock was left on continuesly causing x86
some devices to not reach deep power saving states (S0ix) when suspended
causing to them to quickly drain their battery while suspended.
The 3 commits in this series need to be merged in order to avoid
regressions while bisecting. The clk-pmc-atom driver does not see much
changes (it was last touched over a year ago). So the clk maintainers
have agreed with merging all 3 patches through the net tree.
All 3 patches have Stephen Boyd's Acked-by for this purpose.
This v2 of the series only had some minor tweaks done to the commit
messages and is ready for merging through the net tree now.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Wed, 12 Sep 2018 09:34:56 +0000 (11:34 +0200)]
clk: x86: Stop marking clocks as CLK_IS_CRITICAL
Commit
d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the
firmware"), which added the code to mark clocks as CLK_IS_CRITICAL, causes
all unclaimed PMC clocks on Cherry Trail devices to be on all the time,
resulting on the device not being able to reach S0i3 when suspended.
The reason for this commit is that on some Bay Trail / Cherry Trail devices
the r8169 ethernet controller uses pmc_plt_clk_4. Now that the clk-pmc-atom
driver exports an "ether_clk" alias for pmc_plt_clk_4 and the r8169 driver
has been modified to get and enable this clock (if present) the marking of
the clocks as CLK_IS_CRITICAL is no longer necessary.
This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail
devices not being able to reach S0i3 greatly decreasing their battery
drain when suspended.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Reported-by: Johannes Stezenbach <js@sig21.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Wed, 12 Sep 2018 09:34:55 +0000 (11:34 +0200)]
r8169: Get and enable optional ether_clk clock
On some boards a platform clock is used as clock for the r8169 chip,
this commit adds support for getting and enabling this clock (assuming
it has an "ether_clk" alias set on it).
This is related to commit
d31fd43c0f9a ("clk: x86: Do not gate clocks
enabled by the firmware") which is a previous attempt to fix this for some
x86 boards, but this causes all Cherry Trail SoC using boards to not reach
there lowest power states when suspending.
This commit (together with an atom-pmc-clk driver commit adding the alias)
fixes things properly by making the r8169 get the clock and enable it when
it needs it.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Reported-by: Johannes Stezenbach <js@sig21.net>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Wed, 12 Sep 2018 09:34:54 +0000 (11:34 +0200)]
clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
Commit
d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the
firmware") causes all unclaimed PMC clocks on Cherry Trail devices to be on
all the time, resulting on the device not being able to reach S0i2 or S0i3
when suspended.
The reason for this commit is that on some Bay Trail / Cherry Trail devices
the ethernet controller uses pmc_plt_clk_4. This commit adds an "ether_clk"
alias, so that the relevant ethernet drivers can try to (optionally) use
this, without needing X86 specific code / hacks, thus fixing ethernet on
these devices without breaking S0i3 support.
This commit uses clkdev_hw_create() to create the alias, mirroring the code
for the already existing "mclk" alias for pmc_plt_clk_3.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
Cc: Johannes Stezenbach <js@sig21.net>
Cc: Carlo Caione <carlo@endlessm.com>
Reported-by: Johannes Stezenbach <js@sig21.net>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kai-Heng Feng [Wed, 12 Sep 2018 06:58:21 +0000 (14:58 +0800)]
r8169: enable ASPM on RTL8106E
The Intel SoC was prevented from entering lower idle state because
of RTL8106E's ASPM was not enabled.
So enable ASPM on RTL8106E (chip version 39).
Now the Intel SoC can enter lower idle state, power consumption and
temperature are much lower.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kai-Heng Feng [Wed, 12 Sep 2018 06:58:20 +0000 (14:58 +0800)]
r8169: Align ASPM/CLKREQ setting function with vendor driver
There's a small delay after setting ASPM in vendor drivers, r8101 and
r8168.
In addition, those drivers enable ASPM before ClkReq, also change that
to align with vendor driver.
I haven't seen anything bad becasue of this, but I think it's better to
keep in sync with vendor driver.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Sep 2018 01:43:42 +0000 (18:43 -0700)]
Revert "kcm: remove any offset before parsing messages"
This reverts commit
072222b488bc55cce92ff246bdc10115fd57d3ab.
I just read that this causes regressions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dominique Martinet [Tue, 11 Sep 2018 09:21:43 +0000 (11:21 +0200)]
kcm: remove any offset before parsing messages
The current code assumes kcm users know they need to look for the
strparser offset within their bpf program, which is not documented
anywhere and examples laying around do not do.
The actual recv function does handle the offset well, so we can create a
temporary clone of the skb and pull that one up as required for parsing.
The pull itself has a cost if we are pulling beyond the head data,
measured to 2-3% latency in a noisy VM with a local client stressing
that path. The clone's impact seemed too small to measure.
This bug can be exhibited easily by implementing a "trivial" kcm parser
taking the first bytes as size, and on the client sending at least two
such packets in a single write().
Note that bpf sockmap has the same problem, both for parse and for recv,
so it would pulling twice or a real pull within the strparser logic if
anyone cares about that.
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Mon, 17 Sep 2018 20:34:25 +0000 (22:34 +0200)]
Merge tag 'spi-fix-v4.19-rc4' of https://git./linux/kernel/git/broonie/spi
Mark writes:
"spi: Fixes for v4.19
As well as one driver fix there's a couple of fixes here which address
issues with the use of IDRs for allocation of dynamic bus numbers,
ensuring that dynamic bus numbers interact well with static bus numbers
assigned via DT and otherwise."
* tag 'spi-fix-v4.19-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-dspi: fix broken DSPI_EOQ_MODE
spi: Fix double IDR allocation with DT aliases
spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
David S. Miller [Mon, 17 Sep 2018 16:10:26 +0000 (09:10 -0700)]
Merge branch 's390-qeth-next'
Julian Wiedmann says:
====================
s390/qeth: updates 2018-09-17
please apply the following patchset to net-next. This brings more restructuring
of qeth's transmit code (eliminating its last usage of skb_realloc_headroom()),
and the usual mix of minor improvements & cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:09 +0000 (17:36 +0200)]
s390/qeth: reduce 0-initializing when building IPA cmds
qeth_get_ipacmd_buffer() obtains its buffers for building IPA cmds from
__qeth_get_buffer(), where they are fully cleared. So get rid of all the
additional zero-ing in various other places.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:08 +0000 (17:36 +0200)]
s390/qeth: fine-tune spinlocks
For quite a lot of code paths it's obvious that they will never run in
IRQ context. So replace their spin_lock_irqsave() calls with
spin_lock_irq().
While at it, get rid of the redundant card pointer in struct qeth_reply
that was used by qeth_send_control_data() to access the card's lock.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:07 +0000 (17:36 +0200)]
s390/qeth: fix typo in return value
Assuming this was just a typo, as returning an actual negative value
from a cmd callback would make no sense either.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:06 +0000 (17:36 +0200)]
s390/qeth: invoke softirqs after napi_schedule()
Calling napi_schedule() from process context does not ensure that the
NET_RX softirq is run in a timely fashion. So trigger it manually.
This is no big issue with current code. A call to ndo_open() is usually
followed by a ndo_set_rx_mode() call, and for qeth this contains a
spin_unlock_bh(). Except for OSN, where qeth_l2_set_rx_mode() bails out
early.
Nevertheless it's best to not depend on this behaviour, and just fix
the issue at its source like all other drivers do. For instance see
commit
83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule").
Fixes: a1c3ed4c9ca0 ("qeth: NAPI support for l2 and l3 discipline")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:05 +0000 (17:36 +0200)]
s390/qeth: uninstall IRQ handler on device removal
When setting up, qeth installs its IRQ handler on the ccw devices. But
the IRQ handler is not cleared on removal - so even after qeth yields
control of the ccw devices, spurious interrupts would still be presented
to us.
Make (de-)installation of the IRQ handler part of the ccw channel
setup/removal helpers, and while at it also add the appropriate locking.
Shift around qeth_setup_channel() to avoid a forward declaration for
qeth_irq().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:04 +0000 (17:36 +0200)]
s390/qeth: remove qeth_hdr_chk_and_bounce()
Restructure the OSN xmit path to handle misaligned HW headers properly,
without shifting the packet data around.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:03 +0000 (17:36 +0200)]
s390/qeth: speed up TSO transmission
Switch TSO over to the faster transmit path, and remove all the unused
old TSO code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:02 +0000 (17:36 +0200)]
s390/qeth: prepare for copy-free TSO transmission
Add all the necessary TSO plumbing to the copy-less transmit path.
This includes calculating the right length of required protocol headers,
and always building a separate buffer element for the TSO headers.
A follow-up patch will then switch TSO traffic over to this path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:01 +0000 (17:36 +0200)]
s390/qeth: check size of required HW header cache object
When qeth_add_hw_header() falls back to the header cache, ensure that
the requested length doesn't exceed the object size.
For current usage this is a no-brainer, but TSO transmission will
introduce protocol headers of varying length.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:36:00 +0000 (17:36 +0200)]
s390/qeth: fix up protocol headers early
When qeth_add_hw_header() falls back to the HW header cache, it also
copies over the necessary protocol headers. Thus any manipulation to
the protocol headers needs to happen before adding the HW header.
For current usage this doesn't matter, but it becomes relevant when
moving TSO transmission over to the faster code path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:35:59 +0000 (17:35 +0200)]
s390/qeth: limit csum offload erratum to L3 devices
Combined L3+L4 csum offload is only required for some L3 HW. So for
L2 devices, don't offload the IP header csum calculation.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reference-ID: JUP 394553
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:35:58 +0000 (17:35 +0200)]
s390/qeth: remove qeth_get_elements_no()
Convert the last remaining user of qeth_get_elements_no() to
qeth_count_elements(), so this helper can be removed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:35:57 +0000 (17:35 +0200)]
s390/qeth: remove unused L3 xmit code
qeth_l3_xmit() is now only used for TSOv4 traffic, shrink it down.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:35:56 +0000 (17:35 +0200)]
s390/qeth: run non-offload L3 traffic over common xmit path
L3 OSAs can only offload IPv4 traffic, use the common L2 transmit path
for all other traffic.
In particular there's no support for TX VLAN offload, so any such packet
needs to be manually de-accelerated via ndo_features_check().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Sep 2018 15:35:55 +0000 (17:35 +0200)]
s390/qeth: move L2 xmit code to core module
We need the exact same transmit path for non-offload-eligible traffic on
L3 OSAs. So make it accessible from both sub-drivers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhong jiang [Mon, 17 Sep 2018 10:44:19 +0000 (18:44 +0800)]
net: ethernet: Fix a unused function warning.
Fix the following compile warning:
drivers/net/ethernet/microchip/lan743x_main.c:2964:12: warning: \91lan743x_pm_suspend\92 defined but not used [-Wunused-function]
static int lan743x_pm_suspend(struct device *dev)
drivers/net/ethernet/microchip/lan743x_main.c:2987:12: warning: \91lan743x_pm_resume\92 defined but not used [-Wunused-function]
static int lan743x_pm_resume(struct device *dev)
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weilin Chang [Mon, 17 Sep 2018 05:43:32 +0000 (22:43 -0700)]
liquidio: Add the features to show FEC settings and set FEC settings
1. Add functions for get_fecparam and set_fecparam.
2. Modify lio_get_link_ksettings to display FEC setting.
Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhong jiang [Sun, 16 Sep 2018 13:13:42 +0000 (21:13 +0800)]
net: ethernet: remove redundant null pointer check before of_node_put
of_node_put has taken the null pointer check into account. So it is
safe to remove the duplicated check before of_node_put.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhong jiang [Sun, 16 Sep 2018 13:45:02 +0000 (21:45 +0800)]
net: dsa: remove redundant null pointer check before put_device
put_device has taken the null pinter check into account. So it is
safe to remove the duplicated check before put_device.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhong jiang [Sun, 16 Sep 2018 13:22:31 +0000 (21:22 +0800)]
net: dsa: remove redundant null pointer check before of_node_put
of_node_put has taken the null pointer check into account. So it is
safe to remove the duplicated check before of_node_put.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhong jiang [Sun, 16 Sep 2018 13:20:17 +0000 (21:20 +0800)]
net: usb: remove redundant null pointer check before of_node_put
of_node_put has taken the null pointer check into account. So it is
safe to remove the duplicated check before of_node_put.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Mon, 17 Sep 2018 02:49:30 +0000 (22:49 -0400)]
net: rds: use memset to optimize the recv
The function rds_inc_init is in recv process. To use memset can optimize
the function rds_inc_init.
The test result:
Before:
1) + 24.950 us | rds_inc_init [rds]();
After:
1) + 10.990 us | rds_inc_init [rds]();
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vakul Garg [Sun, 16 Sep 2018 04:34:28 +0000 (10:04 +0530)]
selftests/tls: Add MSG_WAITALL in recv() syscall
A number of tls selftests rely upon recv() to return an exact number of
data bytes. When tls record crypto is done using an async accelerator,
it is possible that recv() returns lesser than expected number bytes.
This leads to failure of many test cases. To fix it, MSG_WAITALL has
been used in flags passed to recv() syscall.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Friedemann Gerold [Sat, 15 Sep 2018 15:03:39 +0000 (18:03 +0300)]
net: aquantia: memory corruption on jumbo frames
This patch fixes skb_shared area, which will be corrupted
upon reception of 4K jumbo packets.
Originally build_skb usage purpose was to reuse page for skb to eliminate
needs of extra fragments. But that logic does not take into account that
skb_shared_info should be reserved at the end of skb data area.
In case packet data consumes all the page (4K), skb_shinfo location
overflows the page. As a consequence, __build_skb zeroed shinfo data above
the allocated page, corrupting next page.
The issue is rarely seen in real life because jumbo are normally larger
than 4K and that causes another code path to trigger.
But it 100% reproducible with simple scapy packet, like:
sendp(IP(dst="192.168.100.3") / TCP(dport=443) \
/ Raw(RandString(size=(4096-40))), iface="enp1s0")
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Friedemann Gerold <f.gerold@b-c-s.de>
Reported-by: Michael Rauch <michael@rauch.be>
Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Sep 2018 15:12:12 +0000 (08:12 -0700)]
Merge branch 'lantiq-Minor-fixes-for-vrx200-and-gswip'
Hauke Mehrtens says:
====================
net: lantiq: Minor fixes for vrx200 and gswip
These are mostly minor fixes to problems addresses in the latests round
of the review of the original series adding these driver, which were not
applied before the patches got merged into net-next.
In addition it fixes a data bus error on poweroff.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sat, 15 Sep 2018 12:08:49 +0000 (14:08 +0200)]
net: dsa: tag_gswip: Add gswip to dsa_tag_protocol_to_str()
The gswip tag was missing in the dsa_tag_protocol_to_str() function, add it.
Fixes: 7969119293f5 ("net: dsa: Add Lantiq / Intel GSWIP tag support")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sat, 15 Sep 2018 12:08:48 +0000 (14:08 +0200)]
net: dsa: lantiq_gswip: Minor code style improvements
Use one code block when returning because the interface type is
unsupported and also check if some unsupported port gets configured.
In addition fix a double the and use dsa_is_cpu_port() instated of
manually getting the CPU port.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sat, 15 Sep 2018 12:08:47 +0000 (14:08 +0200)]
net: lantiq: lantiq_xrx200: Move clock prepare to probe function
The switch and the MAC are in one IP core and they use the same clock
signal from the clock generation unit.
Currently the clock architecture in the lantiq SoC code does not allow
to easily share the same clocks, this has to be fixed by switching to
the common clock framework.
As a workaround the clock of the switch and MAC should be activated when
the MAC gets probed and only disabled when the MAC gets removed. This
way it is ensured that the clock is always enabled when the switch or
MAC is used. The switch can not be used without the MAC.
This fixes a data bus error when rebooting the system and deactivating
the switch and mac and later accessing some registers in the cleanup
while the clocks are disabled.
Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sat, 15 Sep 2018 12:08:46 +0000 (14:08 +0200)]
dt-bindings: net: dsa: lantiq, xrx200-gswip: Fix minor style fixes
* Use one compatible line per line in the documentation
* Remove SoC revision depended compatible lines, we can detect that in
the driver
* Use lower case letters in hex addresses
* Fix the size of the address ranges in the example, this now matches
the sizes used by the SoC. The old ones will also work, this just adds
some empty address space.
* Change the reg size of the gphy-fw node
Fixes: 86ce2bc73c7a ("dt-bindings: net: dsa: Add lantiq, xrx200-gswip DT bindings")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: devicetree@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sat, 15 Sep 2018 12:08:45 +0000 (14:08 +0200)]
dt-bindings: net: lantiq, xrx200-net: Use lower case in hex
Use lower case letters in the addresses of the device tree binding.
In addition replace eth with ethernet and fix the size of the reg
element in the example. The additional range does not contain any
registers but is used for the IP block on the this SoC.
Fixes: 839790e88a3c ("dt-bindings: net: Add lantiq, xrx200-net DT bindings")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: devicetree@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 15 Sep 2018 01:42:09 +0000 (01:42 +0000)]
net: hns: make function hns_gmac_wait_fifo_clean() static
Fixes the following sparse warning:
drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:322:5: warning:
symbol 'hns_gmac_wait_fifo_clean' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 15 Sep 2018 01:33:50 +0000 (01:33 +0000)]
net: lantiq: Fix return value check in xrx200_probe()
In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 15 Sep 2018 01:33:38 +0000 (01:33 +0000)]
net: dsa: gswip: Fix copy-paste error in gswip_gphy_fw_probe()
The return value from of_reset_control_array_get_exclusive() is not
checked correctly. The test is done against a wrong variable. This
patch fix it.
Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 15 Sep 2018 01:33:21 +0000 (01:33 +0000)]
net: dsa: gswip: Fix return value check in gswip_probe()
In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 14 Sep 2018 21:46:12 +0000 (23:46 +0200)]
net: dsa: mv88e6xxx: Fix ATU Miss Violation
Fix a cut/paste error and a typo which results in ATU miss violations
not being reported.
Fixes: 0977644c5005 ("net: dsa: mv88e6xxx: Decode ATU problem interrupt")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 14 Sep 2018 21:00:55 +0000 (23:00 +0200)]
tls: fix currently broken MSG_PEEK behavior
In kTLS MSG_PEEK behavior is currently failing, strace example:
[pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
[pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
[pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid 2430] listen(4, 10) = 0
[pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
[pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [
7564404], 4) = 0
[pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
[pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
[pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [
7564404], 4) = 0
[pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
[pid 2430] close(4) = 0
[pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
[pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
[pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64
As can be seen from strace, there are two TLS records sent,
i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
peeking 'test_read_peektest_read_peektest'. This is clearly
wrong, and what happens is that given peek cannot call into
tls_sw_advance_skb() to unpause strparser and proceed with
the next skb, we end up looping over the current one, copying
the 'test_read_peek' over and over into the user provided
buffer.
Here, we can only peek into the currently held skb (current,
full TLS record) as otherwise we would end up having to hold
all the original skb(s) (depending on the peek depth) in a
separate queue when unpausing strparser to process next
records, minimally intrusive is to return only up to the
current record's size (which likely was what
c46234ebb4d1
("tls: RX path for ktls") originally intended as well). Thus,
after patch we properly peek the first record:
[pid 2046] wait4(2075, <unfinished ...>
[pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
[pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
[pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid 2075] listen(4, 10) = 0
[pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
[pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [
7564404], 4) = 0
[pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
[pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
[pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [
7564404], 4) = 0
[pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
[pid 2075] close(4) = 0
[pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
[pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
[pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14
Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Fri, 14 Sep 2018 20:01:46 +0000 (13:01 -0700)]
tls: async support causes out-of-bounds access in crypto APIs
When async support was added it needed to access the sk from the async
callback to report errors up the stack. The patch tried to use space
after the aead request struct by directly setting the reqsize field in
aead_request. This is an internal field that should not be used
outside the crypto APIs. It is used by the crypto code to define extra
space for private structures used in the crypto context. Users of the
API then use crypto_aead_reqsize() and add the returned amount of
bytes to the end of the request memory allocation before posting the
request to encrypt/decrypt APIs.
So this breaks (with general protection fault and KASAN error, if
enabled) because the request sent to decrypt is shorter than required
causing the crypto API out-of-bounds errors. Also it seems unlikely the
sk is even valid by the time it gets to the callback because of memset
in crypto layer.
Anyways, fix this by holding the sk in the skb->sk field when the
callback is set up and because the skb is already passed through to
the callback handler via void* we can access it in the handler. Then
in the handler we need to be careful to NULL the pointer again before
kfree_skb. I added comments on both the setup (in tls_do_decryption)
and when we clear it from the crypto callback handler
tls_decrypt_done(). After this selftests pass again and fixes KASAN
errors/warnings.
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Sep 2018 14:59:41 +0000 (07:59 -0700)]
Merge branch 'hv_netvsc-associate-VF-and-PV-device-by-serial-number'
Stephen Hemminger says:
====================
hv_netvsc: associate VF and PV device by serial number
The Hyper-V implementation of PCI controller has concept of 32 bit serial number
(not to be confused with PCI-E serial number). This value is sent in the protocol
from the host to indicate SR-IOV VF device is attached to a synthetic NIC.
Using the serial number (instead of MAC address) to associate the two devices
avoids lots of potential problems when there are duplicate MAC addresses from
tunnels or layered devices.
The patch set is broken into two parts, one is for the PCI controller
and the other is for the netvsc device. Normally, these go through different
trees but sending them together here for better review. The PCI changes
were submitted previously, but the main review comment was "why do you
need this?". This is why.
v2 - slot name can be shorter.
remove locking when creating pci_slots; see comment for explaination
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 14 Sep 2018 19:54:57 +0000 (12:54 -0700)]
hv_netvsc: pair VF based on serial number
Matching network device based on MAC address is problematic
since a non VF network device can be creted with a duplicate MAC
address causing confusion and problems. The VMBus API does provide
a serial number that is a better matching method.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Fri, 14 Sep 2018 19:54:56 +0000 (12:54 -0700)]
PCI: hv: support reporting serial number as slot information
The Hyper-V host API for PCI provides a unique "serial number" which
can be used as basis for sysfs PCI slot table. This can be useful
for cases where userspace wants to find the PCI device based on
serial number.
When an SR-IOV NIC is added, the host sends an attach message
with serial number. The kernel doesn't use the serial number, but
it is useful when doing the same thing in a userspace driver such
as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
way to find the matching PCI device.
There maybe some cases where serial number is not unique such
as when using GPU's. But the PCI slot infrastructure will handle
that.
This has a side effect which may also be useful. The common udev
network device naming policy uses the slot information (rather
than PCI address).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Sep 2018 19:41:29 +0000 (15:41 -0400)]
bnxt_en: Fix VF mac address regression.
The recent commit to always forward the VF MAC address to the PF for
approval may not work if the PF driver or the firmware is older. This
will cause the VF driver to fail during probe:
bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
bnxt_en 0000:00:03.0: Unable to initialize mac address.
bnxt_en: probe of 0000:00:03.0 failed with error -99
We fix it by treating the error as fatal only if the VF MAC address is
locally generated by the VF.
Fixes: 707e7e966026 ("bnxt_en: Always forward VF MAC address to the PF.")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Reported-by: Siwei Liu <loseweigh@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 14 Sep 2018 19:02:31 +0000 (12:02 -0700)]
ipv6: fix possible use-after-free in ip6_xmit()
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.
Bring IPv6 in line with what we do in IPv4 to fix this.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 14 Sep 2018 16:39:53 +0000 (17:39 +0100)]
net: hp100: fix always-true check for link up state
The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
that is always non-zero and hence the wait for the link to drop always
terminates prematurely. Fix this by using a logical not operator instead
of a bitwise complement. This issue has been in the driver since
pre-2.6.12-rc2.
Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Ferre [Fri, 14 Sep 2018 15:48:11 +0000 (17:48 +0200)]
ARM: dts: at91: add new compatibility string for macb on sama5d3
We need this new compatibility string as we experienced different behavior
for this 10/100Mbits/s macb interface on this particular SoC.
Backward compatibility is preserved as we keep the alternative strings.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Ferre [Fri, 14 Sep 2018 15:48:10 +0000 (17:48 +0200)]
net: macb: disable scatter-gather for macb on sama5d3
Create a new configuration for the sama5d3-macb new compatibility string.
This configuration disables scatter-gather because we experienced lock down
of the macb interface of this particular SoC under very high load.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Fri, 14 Sep 2018 14:56:35 +0000 (16:56 +0200)]
net: mvpp2: let phylink manage the carrier state
Net drivers using phylink shouldn't mess with the link carrier
themselves and should let phylink manage it. The mvpp2 driver wasn't
following this best practice as the mac_config() function made calls to
change the link carrier state. This led to wrongly reported carrier link
state which then triggered other issues. This patch fixes this
behaviour.
But the PPv2 driver relied on this misbehaviour in two cases: for fixed
links and when not using phylink (ACPI mode). The later was fixed by
adding an explicit call to link_up(), which when the ACPI mode will use
phylink should be removed.
The fixed link case was relying on the mac_config() function to set the
link up, as we found an issue in phylink_start() which assumes the
carrier is off. If not, the link_up() function is never called. To fix
this, a call to netif_carrier_off() is added just before phylink_start()
so that we do not introduce a regression in the driver.
Fixes: 4bb043262878 ("net: mvpp2: phylink support")
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 14 Sep 2018 14:28:05 +0000 (16:28 +0200)]
pppoe: fix reception of frames with no mac header
pppoe_rcv() needs to look back at the Ethernet header in order to
lookup the PPPoE session. Therefore we need to ensure that the mac
header is big enough to contain an Ethernet header. Otherwise
eth_hdr(skb)->h_source might access invalid data.
==================================================================
BUG: KMSAN: uninit-value in __get_item drivers/net/ppp/pppoe.c:172 [inline]
BUG: KMSAN: uninit-value in get_item drivers/net/ppp/pppoe.c:236 [inline]
BUG: KMSAN: uninit-value in pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450
CPU: 0 PID: 4543 Comm: syz-executor355 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:53
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
__get_item drivers/net/ppp/pppoe.c:172 [inline]
get_item drivers/net/ppp/pppoe.c:236 [inline]
pppoe_rcv+0xcef/0x10e0 drivers/net/ppp/pppoe.c:450
__netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562
__netif_receive_skb net/core/dev.c:4627 [inline]
netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
netif_receive_skb+0x230/0x240 net/core/dev.c:4725
tun_rx_batched drivers/net/tun.c:1555 [inline]
tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962
tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x7fb/0x9f0 fs/read_write.c:482
vfs_write+0x463/0x8d0 fs/read_write.c:544
SYSC_write+0x172/0x360 fs/read_write.c:589
SyS_write+0x55/0x80 fs/read_write.c:581
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x4447c9
RSP: 002b:
00007fff64c8fc28 EFLAGS:
00000297 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00000000004447c9
RDX:
000000000000fd87 RSI:
0000000020000600 RDI:
0000000000000004
RBP:
00000000006cf018 R08:
00007fff64c8fda8 R09:
00007fff00006bda
R10:
0000000000005fe7 R11:
0000000000000297 R12:
00000000004020d0
R13:
0000000000402160 R14:
0000000000000000 R15:
0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
slab_post_alloc_hook mm/slab.h:445 [inline]
slab_alloc_node mm/slub.c:2737 [inline]
__kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:984 [inline]
alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
tun_alloc_skb drivers/net/tun.c:1532 [inline]
tun_get_user+0x2242/0x7c60 drivers/net/tun.c:1829
tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
call_write_iter include/linux/fs.h:1782 [inline]
new_sync_write fs/read_write.c:469 [inline]
__vfs_write+0x7fb/0x9f0 fs/read_write.c:482
vfs_write+0x463/0x8d0 fs/read_write.c:544
SYSC_write+0x172/0x360 fs/read_write.c:589
SyS_write+0x55/0x80 fs/read_write.c:581
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================
Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers")
Reported-by: syzbot+f5f6080811c849739212@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corentin Labbe [Fri, 14 Sep 2018 11:20:07 +0000 (11:20 +0000)]
net: ethernet: ti: add missing GENERIC_ALLOCATOR dependency
This patch mades TI_DAVINCI_CPDMA select GENERIC_ALLOCATOR.
without that, the following sparc64 build failure happen
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_check_free_tx_desc':
(.text+0x278): undefined reference to `gen_pool_avail'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_chan_submit':
(.text+0x340): undefined reference to `gen_pool_alloc'
(.text+0x5c4): undefined reference to `gen_pool_free'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `__cpdma_chan_free':
davinci_cpdma.c:(.text+0x64c): undefined reference to `gen_pool_free'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_desc_pool_destroy.isra.6':
davinci_cpdma.c:(.text+0x17ac): undefined reference to `gen_pool_size'
davinci_cpdma.c:(.text+0x17b8): undefined reference to `gen_pool_avail'
davinci_cpdma.c:(.text+0x1824): undefined reference to `gen_pool_size'
davinci_cpdma.c:(.text+0x1830): undefined reference to `gen_pool_avail'
drivers/net/ethernet/ti/davinci_cpdma.o: In function `cpdma_ctlr_create':
(.text+0x19f8): undefined reference to `devm_gen_pool_create'
(.text+0x1a90): undefined reference to `gen_pool_add_virt'
Makefile:1011: recipe for target 'vmlinux' failed
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Mon, 17 Sep 2018 07:13:47 +0000 (09:13 +0200)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Ted writes:
Various ext4 bug fixes; primarily making ext4 more robust against
maliciously crafted file systems, and some DAX fixes.
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4, dax: set ext4_dax_aops for dax files
ext4, dax: add ext4_bmap to ext4_dax_aops
ext4: don't mark mmp buffer head dirty
ext4: show test_dummy_encryption mount option in /proc/mounts
ext4: close race between direct IO and ext4_break_layouts()
ext4: fix online resizing for bigalloc file systems with a 1k block size
ext4: fix online resize's handling of a too-small final block group
ext4: recalucate superblock checksum after updating free blocks/inodes
ext4: avoid arithemetic overflow that can trigger a BUG
ext4: avoid divide by zero fault when deleting corrupted inline directories
ext4: check to make sure the rename(2)'s destination is not freed
ext4: add nonstring annotations to ext4.h
Greg Kroah-Hartman [Mon, 17 Sep 2018 05:24:28 +0000 (07:24 +0200)]
Merge tag 'linux-kselftest-4.19-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pulled kselftest fixes from Shuah:
"This Kselftest fixes update for 4.9-rc5 consists of:
-- fixes to build failures
-- fixes to add missing config files to increase test coverage
-- fixes to cgroup test and a new cgroup test for memory.oom.group"
David S. Miller [Mon, 17 Sep 2018 00:47:03 +0000 (17:47 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-09-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix end boundary calculation in BTF for the type section, from Martin.
2) Fix and revert subtraction of pointers that was accidentally allowed
for unprivileged programs, from Alexei.
3) Fix bpf_msg_pull_data() helper by using __GFP_COMP in order to avoid
a warning in linearizing sg pages into a single one for large allocs,
from Tushar.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Fri, 14 Sep 2018 04:33:44 +0000 (13:33 +0900)]
veth: Orphan skb before GRO
GRO expects skbs not to be owned by sockets, but when XDP is enabled veth
passed skbs owned by sockets. It caused corrupted sk_wmem_alloc.
Paolo Abeni reported the following splat:
[ 362.098904] refcount_t overflow at skb_set_owner_w+0x5e/0xa0 in iperf3[1644], uid/euid: 0/0
[ 362.108239] WARNING: CPU: 0 PID: 1644 at kernel/panic.c:648 refcount_error_report+0xa0/0xa4
[ 362.117547] Modules linked in: tcp_diag inet_diag veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif iTCO_wdt sg ipmi_si iTCO_vendor_support ipmi_devintf mxm_wmi ipmi_msghandler pcspkr dcdbas mei_me wmi mei lpc_ich acpi_power_meter pcc_cpufreq xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe igb ttm ahci mdio libahci ptp crc32c_intel drm pps_core libata i2c_algo_bit dca dm_mirror dm_region_hash dm_log dm_mod
[ 362.176622] CPU: 0 PID: 1644 Comm: iperf3 Not tainted 4.19.0-rc2.vanilla+ #2025
[ 362.184777] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[ 362.193124] RIP: 0010:refcount_error_report+0xa0/0xa4
[ 362.198758] Code: 08 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 80 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 18 4d e7 9d 31 c0 e8 30 fa ff ff <0f> 0b eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc
[ 362.219711] RSP: 0018:
ffff9ee6ff603c20 EFLAGS:
00010282
[ 362.225538] RAX:
0000000000000000 RBX:
ffffffff9de83e10 RCX:
0000000000000000
[ 362.233497] RDX:
0000000000000001 RSI:
ffff9ee6ff6167d8 RDI:
ffff9ee6ff6167d8
[ 362.241457] RBP:
ffff9ee6ff603d78 R08:
0000000000000490 R09:
0000000000000004
[ 362.249416] R10:
0000000000000000 R11:
ffff9ee6ff603990 R12:
ffff9ee664b94500
[ 362.257377] R13:
0000000000000000 R14:
0000000000000004 R15:
ffffffff9de615f9
[ 362.265337] FS:
00007f1d22d28740(0000) GS:
ffff9ee6ff600000(0000) knlGS:
0000000000000000
[ 362.274363] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 362.280773] CR2:
00007f1d222f35d0 CR3:
0000001fddfec003 CR4:
00000000001606f0
[ 362.288733] Call Trace:
[ 362.291459] <IRQ>
[ 362.293702] ex_handler_refcount+0x4e/0x80
[ 362.298269] fixup_exception+0x35/0x40
[ 362.302451] do_trap+0x109/0x150
[ 362.306048] do_error_trap+0xd5/0x130
[ 362.315766] invalid_op+0x14/0x20
[ 362.319460] RIP: 0010:skb_set_owner_w+0x5e/0xa0
[ 362.324512] Code: ef ff ff 74 49 48 c7 43 60 20 7b 4a 9d 8b 85 f4 01 00 00 85 c0 75 16 8b 83 e0 00 00 00 f0 01 85 44 01 00 00 0f 88 d8 23 16 00 <5b> 5d c3 80 8b 91 00 00 00 01 8b 85 f4 01 00 00 89 83 a4 00 00 00
[ 362.345465] RSP: 0018:
ffff9ee6ff603e20 EFLAGS:
00010a86
[ 362.351291] RAX:
0000000000001100 RBX:
ffff9ee65deec700 RCX:
ffff9ee65e829244
[ 362.359250] RDX:
0000000000000100 RSI:
ffff9ee65e829100 RDI:
ffff9ee65deec700
[ 362.367210] RBP:
ffff9ee65e829100 R08:
000000000002a380 R09:
0000000000000000
[ 362.375169] R10:
0000000000000002 R11:
fffff1a4bf77bb00 R12:
ffffc0754661d000
[ 362.383130] R13:
ffff9ee65deec200 R14:
ffff9ee65f597000 R15:
00000000000000aa
[ 362.391092] veth_xdp_rcv+0x4e4/0x890 [veth]
[ 362.399357] veth_poll+0x4d/0x17a [veth]
[ 362.403731] net_rx_action+0x2af/0x3f0
[ 362.407912] __do_softirq+0xdd/0x29e
[ 362.411897] do_softirq_own_stack+0x2a/0x40
[ 362.416561] </IRQ>
[ 362.418899] do_softirq+0x4b/0x70
[ 362.422594] __local_bh_enable_ip+0x50/0x60
[ 362.427258] ip_finish_output2+0x16a/0x390
[ 362.431824] ip_output+0x71/0xe0
[ 362.440670] __tcp_transmit_skb+0x583/0xab0
[ 362.445333] tcp_write_xmit+0x247/0xfb0
[ 362.449609] __tcp_push_pending_frames+0x2d/0xd0
[ 362.454760] tcp_sendmsg_locked+0x857/0xd30
[ 362.459424] tcp_sendmsg+0x27/0x40
[ 362.463216] sock_sendmsg+0x36/0x50
[ 362.467104] sock_write_iter+0x87/0x100
[ 362.471382] __vfs_write+0x112/0x1a0
[ 362.475369] vfs_write+0xad/0x1a0
[ 362.479062] ksys_write+0x52/0xc0
[ 362.482759] do_syscall_64+0x5b/0x180
[ 362.486841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 362.492473] RIP: 0033:0x7f1d22293238
[ 362.496458] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 c5 54 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 362.517409] RSP: 002b:
00007ffebaef8008 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 362.525855] RAX:
ffffffffffffffda RBX:
0000000000002800 RCX:
00007f1d22293238
[ 362.533816] RDX:
0000000000002800 RSI:
00007f1d22d36000 RDI:
0000000000000005
[ 362.541775] RBP:
00007f1d22d36000 R08:
00000002db777a30 R09:
0000562b70712b20
[ 362.549734] R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000005
[ 362.557693] R13:
0000000000002800 R14:
00007ffebaef8060 R15:
0000562b70712260
In order to avoid this, orphan the skb before entering GRO.
Fixes: 948d4f214fde ("veth: Add driver XDP")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Fri, 14 Sep 2018 04:26:48 +0000 (12:26 +0800)]
ip6_gre: simplify gre header parsing in ip6gre_err
Same as ip_gre, use gre_parse_header to parse gre header in gre error
handler code.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Fri, 14 Sep 2018 04:26:47 +0000 (12:26 +0800)]
ip_gre: fix parsing gre header in ipgre_err
gre_parse_header stops parsing when csum_err is encountered, which means
tpi->key is undefined and ip_tunnel_lookup will return NULL improperly.
This patch introduce a NULL pointer as csum_err parameter. Even when
csum_err is encountered, it won't return error and continue parsing gre
header as expected.
Fixes: 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook.")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 13 Sep 2018 18:36:30 +0000 (11:36 -0700)]
net: phy: et011c: Remove incorrect PHY_POLL flags
PHY_POLL is defined as -1 which means that we would be setting all flags of the
PHY driver, this is also not a valid flag to tell PHYLIB about, just remove it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 16 Sep 2018 22:30:23 +0000 (15:30 -0700)]
Merge branch 'act_police-lockless-data-path'
Davide Caratti says:
====================
net/sched: act_police: lockless data path
the data path of 'police' action can be faster if we avoid using spinlocks:
- patch 1 converts act_police to use per-cpu counters
- patch 2 lets act_police use RCU to access its configuration data.
test procedure (using pktgen from https://github.com/netoptimizer):
# ip link add name eth1 type dummy
# ip link set dev eth1 up
# tc qdisc add dev eth1 clsact
# tc filter add dev eth1 egress matchall action police \
> rate 2gbit burst 100k conform-exceed pass/pass index 100
# for c in 1 2 4; do
> ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n
5000000 -i eth1
> done
test results (avg. pps/thread):
$c | before patch | after patch | improvement
----+--------------+--------------+-------------
1 |
3518448 |
3591240 | irrelevant
2 |
3070065 |
3383393 | 10%
4 |
1540969 |
3238385 | 110%
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 13 Sep 2018 17:29:13 +0000 (19:29 +0200)]
net/sched: act_police: don't use spinlock in the data path
use RCU instead of spinlocks, to protect concurrent read/write on
act_police configuration. This reduces the effects of contention in the
data path, in case multiple readers are present.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>