openwrt/staging/blogic.git
6 years agoMerge branch 'tcp-ECN-quickack'
David S. Miller [Tue, 22 May 2018 19:43:16 +0000 (15:43 -0400)]
Merge branch 'tcp-ECN-quickack'

Eric Dumazet says:

====================
tcp: reduce quickack pressure for ECN

Small patch series changing TCP behavior vs quickack and ECN

First patch is a refactoring, adding parameter to tcp_incr_quickack()
and tcp_enter_quickack_mode() helpers.

Second patch implements the change, lowering number of ACK packets
sent after an ECN event.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: do not aggressively quick ack after ECN events
Eric Dumazet [Mon, 21 May 2018 22:08:57 +0000 (15:08 -0700)]
tcp: do not aggressively quick ack after ECN events

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
Eric Dumazet [Mon, 21 May 2018 22:08:56 +0000 (15:08 -0700)]
tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode

We want to add finer control of the number of ACK packets sent after
ECN events.

This patch is not changing current behavior, it only enables following
change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: don't disable bh when accessing action idr
Vlad Buslov [Mon, 21 May 2018 20:03:04 +0000 (23:03 +0300)]
net: sched: don't disable bh when accessing action idr

Initial net_device implementation used ingress_lock spinlock to synchronize
ingress path of device. This lock was used in both process and bh context.
In some code paths action map lock was obtained while holding ingress_lock.
Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
modified actions to always disable bh, while using action map lock, in
order to prevent deadlock on ingress_lock in softirq. This lock was removed
from net_device, so disabling bh, while accessing action map, is no longer
necessary.

Replace all action idr spinlock usage with regular calls that do not
disable bh.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-ipv6-Fix-route-append-and-replace-use-cases'
David S. Miller [Tue, 22 May 2018 18:44:20 +0000 (14:44 -0400)]
Merge branch 'net-ipv6-Fix-route-append-and-replace-use-cases'

David Ahern says:

====================
net/ipv6: Fix route append and replace use cases

This patch set fixes a few append and replace uses cases for IPv6 and
adds test cases that codifies the expectations of how append and replace
are expected to work. In paricular it allows a multipath route to have
a dev-only nexthop, something Thomas tried to accomplish with commit
edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6") which had to be
reverted because of breakage, and to replace an existing FIB entry
with a reject route.

There are a number of inconsistent and surprising aspects to the Linux
API for adding, deleting, replacing and changing FIB entries. For example,
with IPv4 NLM_F_APPEND means insert the route after any existing entries
with the same key (prefix + priority + TOS for IPv4) and NLM_F_CREATE
without the append flag inserts the new route before any existing entries.

IPv6 on the other hand attempts to guess whether a new route should be
appended to an existing one, possibly creating a multipath route, or to
add a new entry after any existing ones. This applies to both the 'append'
(NLM_F_CREATE + NLM_F_APPEND) and 'prepend' (NLM_F_CREATE only) cases
meaning for IPv6 the NLM_F_APPEND is basically ignored. This guessing
whether the route should be added to a multipath route (gateway routes)
or inserted after existing entries (non-gateway based routes) means a
multipath route can not have a dev only nexthop (potentially required in
some cases - tunnels or VRF route leaking for example) and route 'replace'
is a bit adhoc treating gateway based routes and dev-only / reject routes
differently.

This has led to frustration with developers working on routing suites
such as FRR where workarounds such as delete and add are used instead of
replace.

After this patch set there are 2 differences between IPv4 and IPv6:
1. 'ip ro prepend' = NLM_F_CREATE only
    IPv4 adds the new route before any existing ones
    IPv6 adds new route after any existing ones

2. 'ip ro append' = NLM_F_CREATE|NLM_F_APPEND
   IPv4 adds the new route after any existing ones
   IPv6 adds the nexthop to existing routes converting to multipath

For the former, there are cases where we want same prefix routes added
after existing ones (e.g., multicast, prefix routes for macvlan when used
for virtual router redundancy). Requiring the APPEND flag to add a new
route to an existing one helps here but is a slight change in behavior
since prepend with gateway routes now create a separate entry.

For the latter IPv6 behavior is preferred - appending a route for the same
prefix and metric to make a multipath route, so really IPv4 not allowing an
existing route to be updated is the limiter. This will be fixed when
nexthops become separate objects - a future patch set.

Thank you to Thomas and Ido for testing earlier versions of this set, and
to Ido for providing an update to the mlxsw driver.

Changes since RFC
- cleanup wording in test script; add comments about expected failures
  and why
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add ipv4 route add append replace tests
David Ahern [Mon, 21 May 2018 17:26:58 +0000 (10:26 -0700)]
selftests: fib_tests: Add ipv4 route add append replace tests

Add IPv4 route tests covering add, append and replace permutations.
Assumes the ability to add a basic single path route works; this is
required for example when adding an address to an interface.

$ fib_tests.sh -t ipv4_rt

IPv4 route add / append tests
    TEST: Attempt to add duplicate route - gw                           [ OK ]
    TEST: Attempt to add duplicate route - dev only                     [ OK ]
    TEST: Attempt to add duplicate route - reject route                 [ OK ]
    TEST: Add new nexthop for existing prefix                           [ OK ]
    TEST: Append nexthop to existing route - gw                         [ OK ]
    TEST: Append nexthop to existing route - dev only                   [ OK ]
    TEST: Append nexthop to existing route - reject route               [ OK ]
    TEST: Append nexthop to existing reject route - gw                  [ OK ]
    TEST: Append nexthop to existing reject route - dev only            [ OK ]
    TEST: add multipath route                                           [ OK ]
    TEST: Attempt to add duplicate multipath route                      [ OK ]
    TEST: Route add with different metrics                              [ OK ]
    TEST: Route delete with metric                                      [ OK ]

IPv4 route replace tests
    TEST: Single path with single path                                  [ OK ]
    TEST: Single path with multipath                                    [ OK ]
    TEST: Single path with reject route                                 [ OK ]
    TEST: Single path with single path via multipath attribute          [ OK ]
    TEST: Invalid nexthop                                               [ OK ]
    TEST: Single path - replace of non-existent route                   [ OK ]
    TEST: Multipath with multipath                                      [ OK ]
    TEST: Multipath with single path                                    [ OK ]
    TEST: Multipath with single path via multipath attribute            [ OK ]
    TEST: Multipath with reject route                                   [ OK ]
    TEST: Multipath - invalid first nexthop                             [ OK ]
    TEST: Multipath - invalid second nexthop                            [ OK ]
    TEST: Multipath - replace of non-existent route                     [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add ipv6 route add append replace tests
David Ahern [Mon, 21 May 2018 17:26:57 +0000 (10:26 -0700)]
selftests: fib_tests: Add ipv6 route add append replace tests

Add IPv6 route tests covering add, append and replace permutations.
Assumes the ability to add a basic single path route works; this is
required for example when adding an address to an interface.

$ fib_tests.sh -t ipv6_rt

IPv6 route add / append tests
    TEST: Attempt to add duplicate route - gw                           [ OK ]
    TEST: Attempt to add duplicate route - dev only                     [ OK ]
    TEST: Attempt to add duplicate route - reject route                 [ OK ]
    TEST: Add new route for existing prefix (w/o NLM_F_EXCL)            [ OK ]
    TEST: Append nexthop to existing route - gw                         [ OK ]
    TEST: Append nexthop to existing route - dev only                   [ OK ]
    TEST: Append nexthop to existing route - reject route               [ OK ]
    TEST: Append nexthop to existing reject route - gw                  [ OK ]
    TEST: Append nexthop to existing reject route - dev only            [ OK ]
    TEST: Add multipath route                                           [ OK ]
    TEST: Attempt to add duplicate multipath route                      [ OK ]
    TEST: Route add with different metrics                              [ OK ]
    TEST: Route delete with metric                                      [ OK ]

IPv6 route replace tests
    TEST: Single path with single path                                  [ OK ]
    TEST: Single path with multipath                                    [ OK ]
    TEST: Single path with reject route                                 [ OK ]
    TEST: Single path with single path via multipath attribute          [ OK ]
    TEST: Invalid nexthop                                               [ OK ]
    TEST: Single path - replace of non-existent route                   [ OK ]
    TEST: Multipath with multipath                                      [ OK ]
    TEST: Multipath with single path                                    [ OK ]
    TEST: Multipath with single path via multipath attribute            [ OK ]
    TEST: Multipath with reject route                                   [ OK ]
    TEST: Multipath - invalid first nexthop                             [ OK ]
    TEST: Multipath - invalid second nexthop                            [ OK ]
    TEST: Multipath - replace of non-existent route                     [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add option to pause after each test
David Ahern [Mon, 21 May 2018 17:26:56 +0000 (10:26 -0700)]
selftests: fib_tests: Add option to pause after each test

Add option to pause after each test before cleanup is done. Allows
user to do manual inspection or more ad-hoc testing after each test
with the setup in tact.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add command line options
David Ahern [Mon, 21 May 2018 17:26:55 +0000 (10:26 -0700)]
selftests: fib_tests: Add command line options

Add command line options for controlling pause on fail, controlling
specific tests to run and verbose mode rather than relying on environment
variables.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: fib_tests: Add success-fail counts
David Ahern [Mon, 21 May 2018 17:26:54 +0000 (10:26 -0700)]
selftests: fib_tests: Add success-fail counts

As more tests are added, it is convenient to have a tally at the end.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Simplify route replace and appending into multipath route
David Ahern [Mon, 21 May 2018 17:26:53 +0000 (10:26 -0700)]
net/ipv6: Simplify route replace and appending into multipath route

Bring consistency to ipv6 route replace and append semantics.

Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
1. can not replace a route with a reject route. Existing code appends
   a new route instead of replacing the existing one.

2. can not have a multipath route where a leg uses a dev only nexthop

Existing use cases affected by this change:
1. adding a route with existing prefix and metric using NLM_F_CREATE
   without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
   'prepend'). Existing code auto-determines that the new nexthop can
   be appended to an existing route to create a multipath route. This
   change breaks that by requiring the APPEND flag for the new route
   to be added to an existing one. Instead the prepend just adds another
   route entry.

2. route replace. Existing code replaces first matching multipath route
   if new route is multipath capable and fallback to first matching
   non-ECMP route (reject or dev only route) in case one isn't available.
   New behavior replaces first matching route. (Thanks to Ido for spotting
   this one)

Note: Newer iproute2 is needed to display multipath routes with a dev-only
      nexthop. This is due to a bug in iproute2 and parsing nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Add support for route append
David Ahern [Mon, 21 May 2018 17:26:52 +0000 (10:26 -0700)]
mlxsw: spectrum_router: Add support for route append

Handle append for gateway based routes. Dev-only multipath routes will
be handled by a follow on patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'TI-Ethernet-driver-warnings-fixes'
David S. Miller [Mon, 21 May 2018 20:17:11 +0000 (16:17 -0400)]
Merge branch 'TI-Ethernet-driver-warnings-fixes'

Florian Fainelli says:

====================
TI Ethernet driver warnings fixes

This patch series attempts to fix properly the warnings observed with turning
on COMPILE_TEST and TI Ethernet drivers on 64-bit hosts.

Since I don't have any of this hardware, please review carefully for possible
breakage!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoti: ethernet: davinci: Fix cast to int warnings
Florian Fainelli [Mon, 21 May 2018 18:45:55 +0000 (11:45 -0700)]
ti: ethernet: davinci: Fix cast to int warnings

Now that we can compile test this driver on 64-bit hosts, we get some
warnings about how a pointer/address is written/read to/from a register
(sw_token). Fix this by doing the appropriate conversions, we cannot
possibly have the driver work on 64-bit hosts the way the tokens are
managed though, since the registers being written to a 32-bit only.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: davinci_emac: Fix printing of base address
Florian Fainelli [Mon, 21 May 2018 18:45:54 +0000 (11:45 -0700)]
net: ethernet: davinci_emac: Fix printing of base address

Use %pa which is the correct formatter to print a physical address,
instead of %p which is just a pointer.

Fixes: a6286ee630f6 ("net: Add TI DaVinci EMAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: ti: cpsw: Fix cpsw_add_ch_strings() printk format
Florian Fainelli [Mon, 21 May 2018 18:45:53 +0000 (11:45 -0700)]
net: ethernet: ti: cpsw: Fix cpsw_add_ch_strings() printk format

When building on a 64-bit host we will get the following warning:

drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_add_ch_strings':
drivers/net/ethernet/ti/cpsw.c:1284:19: warning: format '%d' expects
argument of type 'int', but argument 5 has type 'long unsigned int'
[-Wformat=]
     "%s DMA chan %d: %s", rx_dir ? "Rx" : "Tx",
                  ~^
                  %ld

Fix this by using an %ld format and casting to long.

Fixes: e05107e6b747 ("net: ethernet: ti: cpsw: add multi queue support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: ti: cpts: Fix timestamp print
Florian Fainelli [Mon, 21 May 2018 18:45:52 +0000 (11:45 -0700)]
net: ethernet: ti: cpts: Fix timestamp print

On 64-bit hosts we will get the following warning:

drivers/net/ethernet/ti/cpts.c: In function 'cpts_overflow_check':
drivers/net/ethernet/ti/cpts.c:297:11: warning: format '%lld' expects
argument of type 'long long int', but argument 3 has type
'__kernel_time_t {aka long int}' [-Wformat=]
  pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec,
ts.tv_nsec);

Fix this by using an appropriate casting that works on all bit sizes.

Fixes: a5c79c26e168 ("ptp: cpts: convert to the 64 bit get/set time methods.")
Fixes: 87c0e764d43a ("cpts: introduce time stamping code and a PTP hardware clock.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoti: ethernet: cpdma: Use correct format for genpool_*
Florian Fainelli [Mon, 21 May 2018 18:45:51 +0000 (11:45 -0700)]
ti: ethernet: cpdma: Use correct format for genpool_*

Now that we can compile davinci_cpdma.c on 64-bit hosts, we can see that
the format used for printing a size_t type is incorrect, use %zd
accordingly.

Fixes: aeec3021043b ("net: ethernet: ti: cpdma: remove used_desc counter")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Mon, 21 May 2018 20:01:54 +0000 (16:01 -0400)]
Merge git://git./linux/kernel/git/davem/net

S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovmcore: move get_vmcore_size out of __init
Rahul Lakkireddy [Mon, 21 May 2018 13:37:50 +0000 (19:07 +0530)]
vmcore: move get_vmcore_size out of __init

Fix below build warning:

WARNING: vmlinux.o(.text+0x422bb8): Section mismatch in reference from
the function vmcore_add_device_dump() to the function
.init.text:get_vmcore_size.constprop.5()

The function vmcore_add_device_dump() references
the function __init get_vmcore_size.constprop.5().
This is often because vmcore_add_device_dump lacks a __init
annotation or the annotation of get_vmcore_size.constprop.5 is wrong.

Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: copy the length of cpl_tx_pkt_core to fw_wr
Ganesh Goudar [Mon, 21 May 2018 06:56:36 +0000 (12:26 +0530)]
cxgb4: copy the length of cpl_tx_pkt_core to fw_wr

immdlen field of FW_ETH_TX_PKT_WR is filled in a wrong way,
we must copy the length of all the cpls encapsulated in fw
work request. In the xmit path we missed adding the length
of CPL_TX_PKT_CORE but we added the length of WR_HDR and it
worked because WR_HDR and CPL_TX_PKT_CORE are of same length.
Add the length of cpl_tx_pkt_core not WR_HDR's. This also
fixes the lso cpl errors for udp tunnels

Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: Sort Kconfig sourcing alphabetically
Florian Fainelli [Mon, 21 May 2018 03:58:28 +0000 (20:58 -0700)]
net: ethernet: Sort Kconfig sourcing alphabetically

A number of entries were not alphabetically sorted, remedy that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: phylink: Don't release NULL GPIO
Florian Fainelli [Mon, 21 May 2018 03:49:47 +0000 (20:49 -0700)]
net: phy: phylink: Don't release NULL GPIO

If CONFIG_GPIOLIB is disabled, gpiod_put() becomes a stub that produces a
warning, this helped identify that we could be attempting to release a NULL
pl->link_gpio GPIO descriptor, so guard against that.

Fixes: daab3349ad1a ("net: phy: phylink: Release link GPIO")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mips_fixes_4.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan...
Linus Torvalds [Mon, 21 May 2018 15:58:00 +0000 (08:58 -0700)]
Merge tag 'mips_fixes_4.17_2' of git://git./linux/kernel/git/jhogan/mips

Pull MIPS fixes from James Hogan:

 - fix build with DEBUG_ZBOOT and MACH_JZ4770 (4.16)

 - include xilfpga FDT in fitImage and stop generating dtb.o (4.15)

 - fix software IO coherence on CM SMP systems (4.8)

 - ptrace: Fix PEEKUSR/POKEUSR to o32 FGRs (3.14)

 - ptrace: Expose FIR register through FP regset (3.13)

 - fix typo in KVM debugfs file name (3.10)

* tag 'mips_fixes_4.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
  MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
  MIPS: xilfpga: Actually include FDT in fitImage
  MIPS: xilfpga: Stop generating useless dtb.o
  KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
  MIPS: ptrace: Expose FIR register through FP regset
  MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4770
  MIPS: c-r4k: Fix data corruption related to cache coherence

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Mon, 21 May 2018 15:37:48 +0000 (08:37 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix refcounting bug for connections in on-packet scheduling mode of
    IPVS, from Julian Anastasov.

 2) Set network header properly in AF_PACKET's packet_snd, from Willem
    de Bruijn.

 3) Fix regressions in 3c59x by converting to generic DMA API. It was
    relying upon the hack that the PCI DMA interfaces would accept NULL
    for EISA devices. From Christoph Hellwig.

 4) Remove RDMA devices before unregistering netdev in QEDE driver, from
    Michal Kalderon.

 5) Use after free in TUN driver ptr_ring usage, from Jason Wang.

 6) Properly check for missing netlink attributes in SMC_PNETID
    requests, from Eric Biggers.

 7) Set DMA mask before performaing any DMA operations in vmxnet3
    driver, from Regis Duchesne.

 8) Fix mlx5 build with SMP=n, from Saeed Mahameed.

 9) Classifier fixes in bcm_sf2 driver from Florian Fainelli.

10) Tuntap use after free during release, from Jason Wang.

11) Don't use stack memory in scatterlists in tls code, from Matt
    Mullins.

12) Not fully initialized flow key object in ipv4 routing code, from
    David Ahern.

13) Various packet headroom bug fixes in ip6_gre driver, from Petr
    Machata.

14) Remove queues from XPS maps using correct index, from Amritha
    Nambiar.

15) Fix use after free in sock_diag, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: ip6_gre: fix tunnel metadata device sharing.
  cxgb4: fix offset in collecting TX rate limit info
  net: sched: red: avoid hashing NULL child
  sock_diag: fix use-after-free read in __sk_free
  sh_eth: Change platform check to CONFIG_ARCH_RENESAS
  net: dsa: Do not register devlink for unused ports
  net: Fix a bug in removing queues from XPS map
  bpf: fix truncated jump targets on heavy expansions
  bpf: parse and verdict prog attach may race with bpf map update
  bpf: sockmap update rollback on error can incorrectly dec prog refcnt
  net: test tailroom before appending to linear skb
  net: ip6_gre: Fix ip6erspan hlen calculation
  net: ip6_gre: Split up ip6gre_changelink()
  net: ip6_gre: Split up ip6gre_newlink()
  net: ip6_gre: Split up ip6gre_tnl_change()
  net: ip6_gre: Split up ip6gre_tnl_link_config()
  net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
  net: ip6_gre: Request headroom in __gre6_xmit()
  selftests/bpf: check return value of fopen in test_verifier.c
  erspan: fix invalid erspan version.
  ...

6 years agomv88e6xxx: Fix uninitialized variable warning.
David S. Miller [Sun, 20 May 2018 23:04:24 +0000 (19:04 -0400)]
mv88e6xxx: Fix uninitialized variable warning.

In mv88e6xxx_probe(), ("np" or "pdata") might be an invariant
but GCC can't see that, therefore:

drivers/net/dsa/mv88e6xxx/chip.c: In function ‘mv88e6xxx_probe’:
drivers/net/dsa/mv88e6xxx/chip.c:4420:13: warning: ‘compat_info’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  chip->info = compat_info;

Actually, it should have warned on the "if (!compat_info)" test, but
whatever.

Explicitly initialize to NULL in the variable declaration to
deal with this.

Fixes: 877b7cb0b6f2 ("net: dsa: mv88e6xxx: Add minimal platform_data support")
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: b53: Extend platform data to include DSA ports
Florian Fainelli [Sun, 20 May 2018 15:56:30 +0000 (08:56 -0700)]
net: dsa: b53: Extend platform data to include DSA ports

The b53 driver already defines and internally uses platform data to let the
glue drivers specify parameters such as the chip id.  What we were missing was
a way to tell the core DSA layer about the ports and their type.

Place a dsa_chip_data structure at the beginning of b53_platform_data for
dsa_register_switch() to access it. This does not require modifications to
b53_common.c which will pass platform_data trough.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mv88exxx-pdata'
David S. Miller [Sun, 20 May 2018 22:58:28 +0000 (18:58 -0400)]
Merge branch 'mv88exxx-pdata'

Andrew Lunn says:

====================
Platform data support for mv88exxx

There are a few Intel based platforms making use of the mv88exxx.
These don't easily have access to device tree in order to instantiate
the switch driver. These patches allow the use of platform data to
hold the configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add support for EEPROM via platform data
Andrew Lunn [Sat, 19 May 2018 20:31:35 +0000 (22:31 +0200)]
net: dsa: mv88e6xxx: Add support for EEPROM via platform data

Add the size of the EEPROM to the platform data, so it can also be
instantiated by a platform device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add minimal platform_data support
Andrew Lunn [Sat, 19 May 2018 20:31:34 +0000 (22:31 +0200)]
net: dsa: mv88e6xxx: Add minimal platform_data support

Not all the world uses device tree. Some parts of the world still use
platform devices and platform data. Add basic support for probing a
Marvell switch via platform data.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Remove OF check for IRQ domain
Andrew Lunn [Sat, 19 May 2018 20:31:33 +0000 (22:31 +0200)]
net: dsa: mv88e6xxx: Remove OF check for IRQ domain

An IRQ domain will work without an OF node. It is not possible to
reference interrupts via a phandle, but C code can still use
irq_find_mapping() to get an interrupt from the domain.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sh_eth-typos'
David S. Miller [Sun, 20 May 2018 22:56:43 +0000 (18:56 -0400)]
Merge branch 'sh_eth-typos'

Sergei Shtylyov says:

====================
sh_eth: fix typos/grammar

Here's a set of 3 patches against DaveM's 'net-next.git' repo plus the R8A77980
support patches posted earlier. They fix the comments typos/grammar and another
typo in the EESR bit...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix typo in comment to BCULR write
Sergei Shtylyov [Sat, 19 May 2018 21:05:02 +0000 (00:05 +0300)]
sh_eth: fix typo in comment to BCULR write

Simon has noticed a typo in the comment accompaining the BCULR write --
fix it and move the comment before the write (following the style of
the other comments), while at it...

Reported-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix comment grammar in 'struct sh_eth_cpu_data'
Sergei Shtylyov [Sat, 19 May 2018 21:03:42 +0000 (00:03 +0300)]
sh_eth: fix comment grammar in 'struct sh_eth_cpu_data'

All the verbs in the comments to the 'struct sh_eth_cpu_data' declaration
should  be in a  3rd person singular, to match the nouns.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix typo in EESR.TRO bit name
Sergei Shtylyov [Sat, 19 May 2018 21:02:36 +0000 (00:02 +0300)]
sh_eth: fix typo in EESR.TRO bit name

The  correct name of the EESR bit 8 is TRO (transmit retry over), not RTO.
Note that EESIPR bit 8, TROIP remained correct...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hns3-next'
David S. Miller [Sun, 20 May 2018 22:53:59 +0000 (18:53 -0400)]
Merge branch 'hns3-next'

Salil Mehta says:

====================
Misc. bug fixes and cleanup for HNS3 driver

This patch-set presents miscellaneous bug fixes and cleanups found
during internal review, system testing and cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for CMDQ and Misc. interrupt init order problem
Yunsheng Lin [Sat, 19 May 2018 15:53:23 +0000 (16:53 +0100)]
net: hns3: Fix for CMDQ and Misc. interrupt init order problem

When vf module is loading, the cmd queue initialization should
happen before misc interrupt initialization, otherwise the misc
interrupt handle will cause using uninitialized cmd queue problem.
There is also the same issue when vf module is unloading.

This patch fixes it by adjusting the location of some function.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fixes kernel panic issue during rmmod hns3 driver
Xi Wang [Sat, 19 May 2018 15:53:22 +0000 (16:53 +0100)]
net: hns3: Fixes kernel panic issue during rmmod hns3 driver

If CONFIG_ARM_SMMU_V3 is enabled, arm64's dma_ops will replace
arm64_swiotlb_dma_ops with iommu_dma_ops. When releasing contiguous
dma memory, the new ops will call the vunmap function which cannot
be run in interrupt context.

Currently, spin_lock_bh is called before vunmap is executed. This
disables BH and causes the interrupt context to be detected to
generate a kernel panic like below:

[ 2831.573400] kernel BUG at mm/vmalloc.c:1621!
[ 2831.577659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
...
[ 2831.699907] Process rmmod (pid: 1893, stack limit = 0x0000000055103ee2)
[ 2831.706507] Call trace:
[ 2831.708941]  vunmap+0x48/0x50
[ 2831.711897]  dma_common_free_remap+0x78/0x88
[ 2831.716155]  __iommu_free_attrs+0xa8/0x1c0
[ 2831.720255]  hclge_free_cmd_desc+0xc8/0x118 [hclge]
[ 2831.725128]  hclge_destroy_cmd_queue+0x34/0x68 [hclge]
[ 2831.730261]  hclge_uninit_ae_dev+0x90/0x100 [hclge]
[ 2831.735127]  hnae3_unregister_ae_dev+0xb0/0x868 [hnae3]
[ 2831.740345]  hns3_remove+0x3c/0x90 [hns3]
[ 2831.744344]  pci_device_remove+0x48/0x108
[ 2831.748342]  device_release_driver_internal+0x164/0x200
[ 2831.753553]  driver_detach+0x4c/0x88
[ 2831.757116]  bus_remove_driver+0x60/0xc0
[ 2831.761026]  driver_unregister+0x34/0x60
[ 2831.764935]  pci_unregister_driver+0x30/0xb0
[ 2831.769197]  hns3_exit_module+0x10/0x978 [hns3]
[ 2831.773715]  SyS_delete_module+0x1f8/0x248
[ 2831.777799]  el0_svc_naked+0x30/0x34

This patch fixes it by using spin_lock instead of spin_lock_bh.

Fixes: 68c0a5c70614 ("net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for netdev not running problem after calling net_stop and net_open
Fuyun Liang [Sat, 19 May 2018 15:53:21 +0000 (16:53 +0100)]
net: hns3: Fix for netdev not running problem after calling net_stop and net_open

The link status update function is called by timer every second. But
net_stop and net_open may be called with very short intervals. The link
status update function can not detect the link state has changed. It
causes the netdev not running problem.

This patch fixes it by updating the link state in ae_stop function.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Use enums instead of magic number in hclge_is_special_opcode
Huazhong Tan [Sat, 19 May 2018 15:53:20 +0000 (16:53 +0100)]
net: hns3: Use enums instead of magic number in hclge_is_special_opcode

This patch does bit of a clean-up by using already defined enums for
certain values in function hclge_is_special_opcode(). Below enums from
have been used as replacements for magic values:

enum hclge_opcode_type{
<snip>
HCLGE_OPC_STATS_64_BIT = 0x0030,
HCLGE_OPC_STATS_32_BIT = 0x0031,
HCLGE_OPC_STATS_MAC = 0x0032,
<snip>
};

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for hns3 module is loaded multiple times problem
Xi Wang [Sat, 19 May 2018 15:53:19 +0000 (16:53 +0100)]
net: hns3: Fix for hns3 module is loaded multiple times problem

If the hns3 driver has been built into kernel and then loaded with
the same driver which built as KLM, it may trigger an error like
below:

[   20.009555] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[   20.016789] hns3: Copyright (c) 2017 Huawei Corporation.
[   20.022100] Error: Driver 'hns3' is already registered, aborting...
[   23.517397] Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[   23.691583] Process insmod (pid: 1982, stack limit = 0x00000000cd5f21cb)
[   23.698270] Call trace:
[   23.700705]  __list_del_entry_valid+0x2c/0xd8
[   23.705049]  hnae3_unregister_client+0x68/0xa8
[   23.709487]  hns3_init_module+0x98/0x1000 [hns3]
[   23.714093]  do_one_initcall+0x5c/0x170
[   23.717918]  do_init_module+0x64/0x1f4
[   23.721654]  load_module+0x1d14/0x24b0
[   23.725390]  SyS_init_module+0x158/0x208
[   23.729300]  el0_svc_naked+0x30/0x34

This patch fixes it by adding module version info.

Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix the missing client list node initialization
Xi Wang [Sat, 19 May 2018 15:53:18 +0000 (16:53 +0100)]
net: hns3: Fix the missing client list node initialization

This patch fixes the missing initialization of the client list node
in the hnae3_register_client() function.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: cleanup of return values in hclge_init_client_instance()
Jian Shen [Sat, 19 May 2018 15:53:17 +0000 (16:53 +0100)]
net: hns3: cleanup of return values in hclge_init_client_instance()

Removes the goto and directly returns in case of errors as part of the
cleanup.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fixes API to fetch ethernet header length with kernel default
Peng Li [Sat, 19 May 2018 15:53:16 +0000 (16:53 +0100)]
net: hns3: Fixes API to fetch ethernet header length with kernel default

During the RX leg driver needs to fetch the ethernet header
length from the RX'ed Buffer Descriptor. Currently, proprietary
version hns3_nic_get_headlen is being used to fetch the header
length which uses l234info present in the Buffer Descriptor
which might not be valid for the first Buffer Descriptor if the
packet is spanning across multiple descriptors.
Kernel default eth_get_headlen API does the job correctly.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fixes error reported by Kbuild and internal review
Salil Mehta [Sat, 19 May 2018 15:53:15 +0000 (16:53 +0100)]
net: hns3: Fixes error reported by Kbuild and internal review

This patch fixes the error reported by Intel's kbuild and fixes a
return value in one of the legs, caught during review of the original
patch sent by kbuild.

Fixes: fdb793670a00 ("net: hns3: Add support of .sriov_configure in HNS3 driver")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: fix network error on resume from suspend
Heiner Kallweit [Sat, 19 May 2018 08:29:33 +0000 (10:29 +0200)]
r8169: fix network error on resume from suspend

This commit removed calls to rtl_set_rx_mode(). This is ok for the
standard path if the link is brought up, however it breaks system
resume from suspend. Link comes up but no network traffic.

Meanwhile common code from rtl_hw_start_8169/8101/8168() was moved
to rtl_hw_start(), therefore re-add the call to rtl_set_rx_mode()
there.

Due to adding this call we have to move definition of rtl_hw_start()
after definition of rtl_set_rx_mode().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: 82d3ff6dd199 ("r8169: remove calls to rtl_set_rx_mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoerspan: set bso bit based on mirrored packet's len
William Tu [Sat, 19 May 2018 02:41:01 +0000 (19:41 -0700)]
erspan: set bso bit based on mirrored packet's len

Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not
handled.  BSO has 4 possible values:
  00 --> Good frame with no error, or unknown integrity
  11 --> Payload is a Bad Frame with CRC or Alignment Error
  01 --> Payload is a Short Frame
  10 --> Payload is an Oversized Frame

Based the short/oversized definitions in RFC1757, the patch sets
the bso bit based on the mirrored packet's size.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoLinux 4.17-rc6
Linus Torvalds [Sun, 20 May 2018 22:31:38 +0000 (15:31 -0700)]
Linux 4.17-rc6

6 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Sun, 20 May 2018 22:24:22 +0000 (18:24 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-05-18

Here's the first bluetooth-next pull request for the 4.18 kernel:

 - Refactoring of the btbcm driver
 - New USB IDs for QCA_ROME and LiteOn controllers
 - Buffer overflow fix if the controller sends invalid advertising data length
 - Various cleanups & fixes for Qualcomm controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRevert "ixgbe: release lock for the duration of ixgbe_suspend_close()"
Jeff Kirsher [Fri, 18 May 2018 18:58:30 +0000 (11:58 -0700)]
Revert "ixgbe: release lock for the duration of ixgbe_suspend_close()"

This reverts commit 6710f970d9979d8f03f6e292bb729b2ee1526d0e.

Gotta love when developers have offline discussions, thinking everyone
is reading their responses/dialog.

The change had the potential for a number of race conditions on
shutdown, which is why we are reverting the change.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qcom/emac: Allocate buffers from local node
Hemanth Puranik [Fri, 18 May 2018 03:29:29 +0000 (08:59 +0530)]
net: qcom/emac: Allocate buffers from local node

Currently we use non-NUMA aware allocation for TPD and RRD buffers,
this patch modifies to use NUMA friendly allocation.

Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'parisc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sun, 20 May 2018 19:44:07 +0000 (12:44 -0700)]
Merge branch 'parisc-4.17-5' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixlets from Helge Deller:
 "Three small section mismatch fixes, one of them was found by 0-day
  test infrastructure"

* 'parisc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Move ccio_cujo20_fixup() into init section
  parisc: Move setup_profiling_timer() out of init section
  parisc: Move find_pa_parent_type() out of init section

6 years agoMerge tag 'for-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Sun, 20 May 2018 19:04:27 +0000 (12:04 -0700)]
Merge tag 'for-4.17-rc5-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We've accumulated some fixes during the last week, some of them were
  in the works for a longer time but there are some newer ones too.

  Most of the fixes have a reproducer and fix user visible problems,
  also candidates for stable kernels. They IMHO qualify for a late rc,
  though I did not expect that many"

* tag 'for-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix crash when trying to resume balance without the resume flag
  btrfs: Fix delalloc inodes invalidation during transaction abort
  btrfs: Split btrfs_del_delalloc_inode into 2 functions
  btrfs: fix reading stale metadata blocks after degraded raid1 mounts
  btrfs: property: Set incompat flag if lzo/zstd compression is set
  Btrfs: fix duplicate extents after fsync of file with prealloc extents
  Btrfs: fix xattr loss after power failure
  Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting

6 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 20 May 2018 18:50:27 +0000 (11:50 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - Łukasz Stelmach spotted a couple of issues with the decompressor.

 - a couple of kdump fixes found while testing kdump

 - replace some perl with shell code

 - resolve SIGFPE breakage

 - kprobes fixes

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: fix kill( ,SIGFPE) breakage
  ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
  ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
  ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
  ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
  ARM: replace unnecessary perl with sed and the shell $(( )) operator
  ARM: kexec: record parent context registers for non-crash CPUs
  ARM: kexec: fix kdump register saving on panic()
  ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel
  ARM: 8753/1: decompressor: add a missing parameter to the addruart macro

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 18:28:32 +0000 (11:28 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "An unfortunately larger set of fixes, but a large portion is
  selftests:

   - Fix the missing clusterid initializaiton for x2apic cluster
     management which caused boot failures due to IPIs being sent to the
     wrong cluster

   - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
     task

   - Wrap access to __supported_pte_mask in __startup_64() where clang
     compile fails due to a non PC relative access being generated.

   - Two fixes for 5 level paging fallout in the decompressor:

      - Handle GOT correctly for paging_prepare() and
        cleanup_trampoline()

      - Fix the page table handling in cleanup_trampoline() to avoid
        page table corruption.

   - Stop special casing protection key 0 as this is inconsistent with
     the manpage and also inconsistent with the allocation map handling.

   - Override the protection key wen moving away from PROT_EXEC to
     prevent inaccessible memory.

   - Fix and update the protection key selftests to address breakage and
     to cover the above issue

   - Add a MOV SS self test"

[ Part of the x86 fixes were in the earlier core pull due to dependencies ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
  x86/apic/x2apic: Initialize cluster ID properly
  x86/boot/compressed/64: Fix moving page table out of trampoline memory
  x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
  x86/pkeys: Do not special case protection key 0
  x86/pkeys/selftests: Add a test for pkey 0
  x86/pkeys/selftests: Save off 'prot' for allocations
  x86/pkeys/selftests: Fix pointer math
  x86/pkeys: Override pkey when moving away from PROT_EXEC
  x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
  x86/pkeys/selftests: Add PROT_EXEC test
  x86/pkeys/selftests: Factor out "instruction page"
  x86/pkeys/selftests: Allow faults on unknown keys
  x86/pkeys/selftests: Avoid printf-in-signal deadlocks
  x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
  x86/pkeys/selftests: Stop using assert()
  x86/pkeys/selftests: Give better unexpected fault error messages
  x86/selftests: Add mov_to_ss test
  x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
  x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
  ...

6 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 18:25:54 +0000 (11:25 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull UP timer fix from Thomas Gleixner:
 "Work around the for_each_cpu() oddity on UP kernels in the tick
  broadcast code which causes boot failures because the CPU0 bit is
  always reported as set independent of the cpumask content"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/broadcast: Use for_each_cpu() specially on UP kernels

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 18:23:34 +0000 (11:23 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixlets from Thomas Gleixner:
 "Three trivial fixlets for the scheduler:

   - move print_rt_rq() and print_dl_rq() declarations to the right
     place

   - make grub_reclaim() static

   - fix the bogus documentation reference in Kconfig"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix documentation file path
  sched/deadline: Make the grub_reclaim() function static
  sched/debug: Move the print_rt_rq() and print_dl_rq() declarations to kernel/sched/sched.h

6 years agoMerge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 18:20:40 +0000 (11:20 -0700)]
Merge branch 'ras-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull RAS fix from Thomas Gleixner:
 "Fix a regression in the new AMD SMCA code which issues an SMP function
  call from the early interrupt disabled region of CPU hotplug. To avoid
  that, use cached block addresses which can be used directly"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Cache SMCA MISC block addresses

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 18:18:42 +0000 (11:18 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf tooling fixes from Thomas Gleixner:

 - fix segfault when processing unknown threads in cs-etm

 - fix "perf test inet_pton" on s390 failing due to missing inline

 - display all available events on 'perf annotate --stdio'

 - add missing newline when parsing an empty BPF program

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Add missing newline when parsing empty BPF proggie
  perf cs-etm: Remove redundant space
  perf cs-etm: Support unknown_thread in cs_etm_auxtrace
  perf annotate: Display all available events on --stdio
  perf test: "probe libc's inet_pton" fails on s390 due to missing inline

6 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 17:43:27 +0000 (10:43 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two fixes to address shortcomings of the rwsem/percpu-rwsem lock
  debugging code which emits false positive warnings when the rwsem is
  anonymously locked and unlocked"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
  locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag

6 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 17:36:52 +0000 (10:36 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:

 - Use explicitely sized type for the romimage pointer in the 32bit EFI
   protocol struct so a 64bit kernel does not expand it to 64bit. Ditto
   for the 64bit struct to avoid the reverse issue on 32bit kernels.

 - Handle randomized tex offset correctly in the ARM64 EFI stub to avoid
   unaligned data resulting in stack corruption and other hard to
   diagnose wreckage.

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub/arm64: Handle randomized TEXT_OFFSET
  efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode

6 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 20 May 2018 17:01:38 +0000 (10:01 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull core fixes from Thomas Gleixner:

 - Unbreak the BPF compilation which got broken by the unconditional
   requirement of asm-goto, which is not supported by clang.

 - Prevent probing on exception masking instructions in uprobes and
   kprobes to avoid the issues of the delayed exceptions instead of
   having an ugly workaround.

 - Prevent a double free_page() in the error path of do_kexec_load()

 - A set of objtool updates addressing various issues mostly related to
   switch tables and the noreturn detection for recursive sibling calls

 - Header sync for tools.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Detect RIP-relative switch table references, part 2
  objtool: Detect RIP-relative switch table references
  objtool: Support GCC 8 switch tables
  objtool: Support GCC 8's cold subfunctions
  objtool: Fix "noreturn" detection for recursive sibling calls
  objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
  x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
  uprobes/x86: Prohibit probing on MOV SS instruction
  kprobes/x86: Prohibit probing on exception masking instructions
  x86/kexec: Avoid double free_page() upon do_kexec_load() failure

6 years agonet: ip6_gre: fix tunnel metadata device sharing.
William Tu [Sat, 19 May 2018 02:22:28 +0000 (19:22 -0700)]
net: ip6_gre: fix tunnel metadata device sharing.

Currently ip6gre and ip6erspan share single metadata mode device,
using 'collect_md_tun'.  Thus, when doing:
  ip link add dev ip6gre11 type ip6gretap external
  ip link add dev ip6erspan12 type ip6erspan external
  RTNETLINK answers: File exists
simply fails due to the 2nd tries to create the same collect_md_tun.

The patch fixes it by adding a separate collect md tunnel device
for the ip6erspan, 'collect_md_tun_erspan'.  As a result, a couple
of places need to refactor/split up in order to distinguish ip6gre
and ip6erspan.

First, move the collect_md check at ip6gre_tunnel_{unlink,link} and
create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}.
Then before link/unlink, make sure the link_md/unlink_md is called.
Finally, a separate ndo_uninit is created for ip6erspan.  Tested it
using the samples/bpf/test_tunnel_bpf.sh.

Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sh_eth-R8A77980-GEther-support'
David S. Miller [Sun, 20 May 2018 03:24:47 +0000 (23:24 -0400)]
Merge branch 'sh_eth-R8A77980-GEther-support'

Sergei Shtylyov says:

====================
Add Renesas R8A77980 GEther support

Here's a set of 3 patches against DaveM's 'net-next.git' repo. They (gradually)
add R8A77980 GEther support to the 'sh_eth' driver, starting with couple new
register bits/values introduced with this chip, and ending with adding a new
'struct sh_eth_cpu_data' instance connected to the new DT "compatible" prop
value...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: add R8A77980 support
Sergei Shtylyov [Fri, 18 May 2018 18:32:46 +0000 (21:32 +0300)]
sh_eth: add R8A77980 support

Finally, add support for the DT probing of the R-Car V3H (AKA R8A77980) --
it's the only R-Car gen3 SoC having the GEther controller -- others have
only EtherAVB...

Based on the original (and large) patch by Vladimir Barinov.

Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: add EDMR.NBST support
Sergei Shtylyov [Fri, 18 May 2018 18:31:28 +0000 (21:31 +0300)]
sh_eth: add EDMR.NBST support

The R-Car V3H (AKA R8A77980) GEther controller adds the DMA burst mode bit
(NBST) in EDMR and the manual tells to always set it before doing any DMA.

Based on the original (and large) patch by Vladimir Barinov.

Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: add RGMII support
Sergei Shtylyov [Fri, 18 May 2018 18:30:18 +0000 (21:30 +0300)]
sh_eth: add RGMII support

The R-Car V3H (AKA R8A77980) GEther controller  adds support for the RGMII
PHY interface mode as a new  value  for the RMII_MII register.

Based on the original (and large) patch by Vladimir Barinov.

Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Sun, 20 May 2018 02:56:15 +0000 (19:56 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A handful of fixes. I've been queuing them up a bit too long so the
  list is longer than it otherwise would have been spread out across a
  few -rcs.

  In general, it's a scattering of fixes across several platforms,
  nothing truly serious enough to point out.

  There's a slightly larger batch of them for the Davinci platforms due
  to work to bring them back to life after some time, so there's a
  handful of regressions, some of them going back very far, others more
  recent.

  There's also a few patches fixing DT on Renesas platforms since they
  changed some bindings without remaining backwards compatible,
  splitting up describing LVDS as a proper bridge instead of having it
  as part of the display unit.

  We could push for them to be backwards compatible with old device
  trees, but it's likely to regress eventually if nobody's actually
  using said compatibility"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (36 commits)
  ARM: davinci: board-dm646x-evm: set VPIF capture card name
  ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
  ARM: davinci: dm646x: fix timer interrupt generation
  ARM: keystone: fix platform_domain_notifier array overrun
  arm64: dts: exynos: Fix interrupt type for I2S1 device on Exynos5433
  ARM: dts: imx51-zii-rdu1: fix touchscreen bindings
  firmware: arm_scmi: Use after free in scmi_create_protocol_device()
  ARM: dts: cygnus: fix irq type for arm global timer
  Revert "ARM: dts: logicpd-som-lv: Fix pinmux controller references"
  tee: check shm references are consistent in offset/size
  tee: shm: fix use-after-free via temporarily dropped reference
  ARM: dts: imx7s: Pass the 'fsl,sec-era' property
  ARM: dts: tegra20: Revert "Fix ULPI regression on Tegra20"
  ARM: dts: correct missing "compatible" entry for ti81xx SoCs
  ARM: OMAP1: ams-delta: fix deferred_fiq handler
  arm64: tegra: Make BCM89610 PHY interrupt as active low
  ARM: davinci: fix GPIO lookup for I2C
  ARM: dts: logicpd-som-lv: Fix pinmux controller references
  ARM: dts: logicpd-som-lv: Fix Audio Mute
  ARM: dts: logicpd-som-lv: Fix WL127x Startup Issues
  ...

6 years agonet: mvpp2: Add missing VLAN tag detection
Maxime Chevallier [Fri, 18 May 2018 07:33:39 +0000 (09:33 +0200)]
net: mvpp2: Add missing VLAN tag detection

Marvell PPv2 Header Parser sets some bits in the 'result_info' field in
each lookup iteration, to identify different packet attributes such as
DSA / VLAN tag, protocol infos, etc. This is used in further
classification stages in the controller.

It's the DSA tag detection entry that is in charge of detecting when there
is a single VLAN tag.

This commits adds the missing update of the result_info in this case.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Olof Johansson [Sun, 20 May 2018 00:58:32 +0000 (17:58 -0700)]
Merge tag 'tegra-for-4.17-fixes-2' of git://git./linux/kernel/git/tegra/linux into fixes

arm64: tegra: Device tree fixes for v4.17

This contains a one-line update to the device tree of the Tegra186 P3310
processor module, fixing the polarity of the PHY interrupt. Originally,
this was queued to go into v4.18, but the PHY ID matching patch has now
found its way into v4.17-rc5, which means that the PHY driver will know
how to identify the PHY on this board and try to use the interrupt. This
will unfortunately cause networking to break on P3310, hence why I think
this should go into v4.17.

* tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  arm64: tegra: Make BCM89610 PHY interrupt as active low

Signed-off-by: Olof Johansson <olof@lixom.net>
6 years agoMerge branch 'devlink-port-flavours-and-phys_port_name'
David S. Miller [Sat, 19 May 2018 20:30:39 +0000 (16:30 -0400)]
Merge branch 'devlink-port-flavours-and-phys_port_name'

Jiri Pirko says:

====================
devlink: introduce port flavours and common phys_port_name generation

This patchset resolves 2 issues we have right now:
1) There are many netdevices / ports in the system, for port, pf, vf
   represenatation but the user has no way to see which is which
2) The ndo_get_phys_port_name is implemented in each driver separatelly,
   which may lead to inconsistent names between drivers.

This patchset introduces port flavours which should address the first
problem. In this initial patchset, I focus on DSA and their port
flavours. As a follow-up, I plan to add PF and VF representor flavours.
However, that needs additional dependencies in drivers (nfp, mlx5).

The common phys_port_name generation is used by mlxsw. An example output
for mlxsw looks like this:

...
pci/0000:03:00.0/59: type eth netdev enp3s0np4 flavour physical number 4
pci/0000:03:00.0/61: type eth netdev enp3s0np1 flavour physical number 1
pci/0000:03:00.0/63: type eth netdev enp3s0np2 flavour physical number 2
pci/0000:03:00.0/49: type eth netdev enp3s0np8s0 flavour physical number 8 split_group 8 subport 0
pci/0000:03:00.0/50: type eth netdev enp3s0np8s1 flavour physical number 8 split_group 8 subport 1
pci/0000:03:00.0/51: type eth netdev enp3s0np8s2 flavour physical number 8 split_group 8 subport 2
pci/0000:03:00.0/52: type eth netdev enp3s0np8s3 flavour physical number 8 split_group 8 subport 3

As you can see, the netdev names are generated according to the flavour
and port number. In case the port is split, the split subnumber is also
included.

An example output for dsa_loop testing module looks like this:
mdio_bus/fixed-0:1f/0: type eth netdev lan1 flavour physical number 0
mdio_bus/fixed-0:1f/1: type eth netdev lan2 flavour physical number 1
mdio_bus/fixed-0:1f/2: type eth netdev lan3 flavour physical number 2
mdio_bus/fixed-0:1f/3: type eth netdev lan4 flavour physical number 3
mdio_bus/fixed-0:1f/4: type notset
mdio_bus/fixed-0:1f/5: type notset flavour cpu number 5
mdio_bus/fixed-0:1f/6: type notset
mdio_bus/fixed-0:1f/7: type notset
mdio_bus/fixed-0:1f/8: type notset
mdio_bus/fixed-0:1f/9: type notset
mdio_bus/fixed-0:1f/10: type notset
mdio_bus/fixed-0:1f/11: type notset

---
RFC->v1:
-removed nfp patches, removed DSA patch that used name generation helper
-patch 1:
 - Reduced the nfp change just to simply use newly created attr_set func
-patch 2:
 - rebased
 - removed pf/vf reps flavours
-patch 3:
 - rebased
-patch 4:
 - added missing break pointed out by Andrew
====================

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: use devlink helper to generate physical port name
Jiri Pirko [Fri, 18 May 2018 07:29:04 +0000 (09:29 +0200)]
mlxsw: use devlink helper to generate physical port name

Since devlink knows the info needed to generate the physical port name
in a generic way for all devlink users, use the helper to do the job.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodsa: set devlink port attrs for dsa ports
Jiri Pirko [Fri, 18 May 2018 07:29:03 +0000 (09:29 +0200)]
dsa: set devlink port attrs for dsa ports

Set the attrs and allow to expose port flavour to user via devlink.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: introduce a helper to generate physical port names
Jiri Pirko [Fri, 18 May 2018 07:29:02 +0000 (09:29 +0200)]
devlink: introduce a helper to generate physical port names

Each driver implements physical port name generation by itself. However
as devlink has all needed info, it can easily do the job for all its
users. So implement this helper in devlink.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: extend attrs_set for setting port flavours
Jiri Pirko [Fri, 18 May 2018 07:29:01 +0000 (09:29 +0200)]
devlink: extend attrs_set for setting port flavours

Devlink ports can have specific flavour according to the purpose of use.
This patch extend attrs_set so the driver can say which flavour port
has. Initial flavours are:
physical, cpu, dsa
User can query this to see right away what is the purpose of each port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: introduce devlink_port_attrs_set
Jiri Pirko [Fri, 18 May 2018 07:29:00 +0000 (09:29 +0200)]
devlink: introduce devlink_port_attrs_set

Change existing setter for split port information into more generic
attrs setter. Alongside with that, allow to set port number and subport
number for split ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoARM: fix kill( ,SIGFPE) breakage
Russell King [Thu, 12 Apr 2018 23:22:47 +0000 (00:22 +0100)]
ARM: fix kill( ,SIGFPE) breakage

Commit 7771c6645700 ("signal/arm: Document conflicts with SI_USER and
SIGFPE") broke the siginfo structure for userspace triggered signals,
causing the strace testsuite to regress.  Fix this by eliminating
the FPE_FIXME definition (which is at the root of the breakage) and
use FPE_FLTINV instead for the case where the hardware appears to be
reporting nonsense.

Fixes: 7771c6645700 ("signal/arm: Document conflicts with SI_USER and SIGFPE")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoMerge tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sat, 19 May 2018 16:54:02 +0000 (09:54 -0700)]
Merge tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fix from Vinod Koul:

 - qcom bam runtime_pm fix

 - email update for Vinod

* tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: qcom: bam_dma: check if the runtime pm enabled
  dmaengine: Update email address for Vinod

6 years agommap: relax file size limit for regular files
Linus Torvalds [Sat, 19 May 2018 16:29:11 +0000 (09:29 -0700)]
mmap: relax file size limit for regular files

Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was
introduced to catch problems in various ad-hoc character device drivers
doing mmap and getting the size limits wrong.  In the process, it used
"known good" limits for the normal cases of mapping regular files and
block device drivers.

It turns out that the "s_maxbytes" limit was less "known good" than I
thought.  In particular, /proc doesn't set it, but exposes one regular
file to mmap: /proc/vmcore.  As a result, that file got limited to the
default MAX_INT s_maxbytes value.

This went unnoticed for a while, because apparently the only thing that
needs it is the s390 kernel zfcpdump, but there might be other tools
that use this too.

Vasily suggested just changing s_maxbytes for all of /proc, which isn't
wrong, but makes me nervous at this stage.  So instead, just make the
new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't
affect anything else.  It wasn't the regular file case I was worried
about.

I'd really prefer for maxsize to have been per-inode, but that is not
how things are today.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86/MCE/AMD: Cache SMCA MISC block addresses
Borislav Petkov [Thu, 17 May 2018 08:46:26 +0000 (10:46 +0200)]
x86/MCE/AMD: Cache SMCA MISC block addresses

... into a global, two-dimensional array and service subsequent reads from
that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
disabled).

In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
of the bank->blocks pointer.

Fixes: 27bd59502702 ("x86/mce/AMD: Get address from already initialized block")
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
6 years agoARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
Masami Hiramatsu [Sun, 13 May 2018 04:04:29 +0000 (05:04 +0100)]
ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions

Since do_undefinstr() uses get_user to get the undefined
instruction, it can be called before kprobes processes
recursive check. This can cause an infinit recursive
exception.
Prohibit probing on get_user functions.

Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
Masami Hiramatsu [Sun, 13 May 2018 04:04:16 +0000 (05:04 +0100)]
ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr

Prohibit kprobes on do_undefinstr because kprobes on
arm is implemented by undefined instruction. This means
if we probe do_undefinstr(), it can cause infinit
recursive exception.

Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8770/1: kprobes: Prohibit probing on optimized_callback
Masami Hiramatsu [Sun, 13 May 2018 04:04:10 +0000 (05:04 +0100)]
ARM: 8770/1: kprobes: Prohibit probing on optimized_callback

Prohibit probing on optimized_callback() because
it is called from kprobes itself. If we put a kprobes
on it, that will cause a recursive call loop.
Mark it NOKPROBE_SYMBOL.

Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
Masami Hiramatsu [Sun, 13 May 2018 04:03:54 +0000 (05:03 +0100)]
ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed

Since get_kprobe_ctlblk() uses smp_processor_id() to access
per-cpu variable, it hits smp_processor_id sanity check as below.

[    7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[    7.007859] caller is debug_smp_processor_id+0x20/0x24
[    7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1
[    7.008890] Hardware name: Generic DT based system
[    7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24)
[    7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98)
[    7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c)
[    7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24)
[    7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4)
[    7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)

To fix this issue, call get_kprobe_ctlblk() right after
irq-disabled since that disables preemption.

Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: replace unnecessary perl with sed and the shell $(( )) operator
Russell King [Mon, 16 Apr 2018 12:21:54 +0000 (13:21 +0100)]
ARM: replace unnecessary perl with sed and the shell $(( )) operator

You can build a kernel in a cross compiling environment that doesn't
have perl in the $PATH. Commit 429f7a062e3b broke that for 32 bit
ARM. Fix it.

As reported by Stephen Rothwell, it appears that the symbols can be
either part of the BSS section or absolute symbols depending on the
binutils version.  When they're an absolute symbol, the $(( ))
operator errors out and the build fails.  Fix this as well.

Fixes: 429f7a062e3b ("ARM: decompressor: fix BSS size calculation")
Reported-by: Rob Landley <rob@landley.net>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: kexec: record parent context registers for non-crash CPUs
Russell King [Wed, 11 Apr 2018 18:35:19 +0000 (19:35 +0100)]
ARM: kexec: record parent context registers for non-crash CPUs

How we got to machine_crash_nonpanic_core() (iow, from an IPI, etc) is
not interesting for debugging a crash.  The more interesting context
is the parent context prior to the IPI being received.

Record the parent context register state rather than the register state
in machine_crash_nonpanic_core(), which is more relevant to the failing
condition.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: kexec: fix kdump register saving on panic()
Russell King [Wed, 11 Apr 2018 17:24:01 +0000 (18:24 +0100)]
ARM: kexec: fix kdump register saving on panic()

When a panic() occurs, the kexec code uses smp_send_stop() to stop
the other CPUs, but this results in the CPU register state not being
saved, and gdb is unable to inspect the state of other CPUs.

Commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path") addressed the issue on x86, but
ignored other architectures.  Address the issue on ARM by splitting
out the crash stop implementation to crash_smp_send_stop() and
adding the necessary protection.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel
Łukasz Stelmach [Wed, 4 Apr 2018 07:46:58 +0000 (08:46 +0100)]
ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel

The hypervisor setup before __enter_kernel destroys the value
sotred in r1. The value needs to be restored just before the jump.

Fixes: 6b52f7bdb888 ("ARM: hyp-stub: Use r1 for the soft-restart address")
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agoARM: 8753/1: decompressor: add a missing parameter to the addruart macro
Łukasz Stelmach [Tue, 3 Apr 2018 08:04:57 +0000 (09:04 +0100)]
ARM: 8753/1: decompressor: add a missing parameter to the addruart macro

In commit 639da5ee374b ("ARM: add an extra temp register to the low
level debugging addruart macro") an additional temporary register was
added to the addruart macro, but the decompressor code wasn't updated.

Fixes: 639da5ee374b ("ARM: add an extra temp register to the low level debugging addruart macro")
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
6 years agox86/mm: Drop TS_COMPAT on 64-bit exec() syscall
Dmitry Safonov [Thu, 17 May 2018 23:35:10 +0000 (00:35 +0100)]
x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32    child->thread.status & TS_COMPAT
   x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64    !ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
        0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
        0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
        0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
        0x0000f7fe4000-0x0000f7fe5000
        0x7fed9abff000-0x7fed9af54000
        0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE:         0x7f63b8309000
AT_BASE:         0x7faec143c000
AT_BASE:         0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE:         0xf7eff000
AT_BASE:         0xf7cee000
AT_BASE:         0x7f8b9774e000

Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Bisected-by: Alexander Monakov <amonakov@ispras.ru>
Investigated-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alexander Monakov <amonakov@ispras.ru>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
6 years agoobjtool: Detect RIP-relative switch table references, part 2
Josh Poimboeuf [Fri, 18 May 2018 20:10:34 +0000 (15:10 -0500)]
objtool: Detect RIP-relative switch table references, part 2

With the following commit:

  fd35c88b7417 ("objtool: Support GCC 8 switch tables")

I added a "can't find switch jump table" warning, to stop covering up
silent failures if add_switch_table() can't find anything.

That warning found yet another bug in the objtool switch table detection
logic.  For cases 1 and 2 (as described in the comments of
find_switch_table()), the find_symbol_containing() check doesn't adjust
the offset for RIP-relative switch jumps.

Incidentally, this bug was already fixed for case 3 with:

  6f5ec2993b1f ("objtool: Detect RIP-relative switch table references")

However, that commit missed the fix for cases 1 and 2.

The different cases are now starting to look more and more alike.  So
fix the bug by consolidating them into a single case, by checking the
original dynamic jump instruction in the case 3 loop.

This also simplifies the code and makes it more robust against future
switch table detection issues -- of which I'm sure there will be many...

Switch table detection has been the most fragile area of objtool, by
far.  I long for the day when we'll have a GCC plugin for annotating
switch tables.  Linus asked me to delay such a plugin due to the
flakiness of the plugin infrastructure in older versions of GCC, so this
rickety code is what we're stuck with for now.  At least the code is now
a little simpler than it was.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f400541613d45689086329432f3095119ffbc328.1526674218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoefi/libstub/arm64: Handle randomized TEXT_OFFSET
Mark Rutland [Fri, 18 May 2018 14:08:41 +0000 (16:08 +0200)]
efi/libstub/arm64: Handle randomized TEXT_OFFSET

When CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET is an arbitrary
multiple of PAGE_SIZE in the interval [0, 2MB).

The EFI stub does not account for the potential misalignment of
TEXT_OFFSET relative to EFI_KIMG_ALIGN, and produces a randomized
physical offset which is always a round multiple of EFI_KIMG_ALIGN.
This may result in statically allocated objects whose alignment exceeds
PAGE_SIZE to appear misaligned in memory. This has been observed to
result in spurious stack overflow reports and failure to make use of
the IRQ stacks, and theoretically could result in a number of other
issues.

We can OR in the low bits of TEXT_OFFSET to ensure that we have the
necessary offset (and hence preserve the misalignment of TEXT_OFFSET
relative to EFI_KIMG_ALIGN), so let's do that.

Reported-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Kim Phillips <kim.phillips@arm.com>
[ardb: clarify comment and commit log, drop unneeded parens]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 6f26b3671184c36d ("arm64: kaslr: increase randomization granularity")
Link: http://lkml.kernel.org/r/20180518140841.9731-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 19 May 2018 04:24:26 +0000 (21:24 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hfsplus: stop workqueue when fill_super() failed
  mm: don't allow deferred pages with NEED_PER_CPU_KM
  MAINTAINERS: add Q: entry to kselftest for patchwork project
  radix tree: fix multi-order iteration race
  radix tree test suite: multi-order iteration race
  radix tree test suite: add item_delete_rcu()
  radix tree test suite: fix compilation issue
  radix tree test suite: fix mapshift build target
  include/linux/mm.h: add new inline function vmf_error()
  lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly

6 years agoMerge tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform...
Linus Torvalds [Sat, 19 May 2018 04:22:16 +0000 (21:22 -0700)]
Merge tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Remove the last of the "select DELL_SMBIOS" references in the Kconfig"

* tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: DELL_WMI use depends on instead of select for DELL_SMBIOS

6 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 19 May 2018 04:19:02 +0000 (21:19 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:

 - a modified revert of a patch that made new choices come out for a
   couple stm32 clk drivers that really always need to be there when
   that particular machine is compiled in

 - boot fix on i.MX for Stefan who noticed odd behavior from the
   critical flag patch that came in during the merge window

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: stm32: fix: stm32 clock drivers are not compiled by default
  clk: imx6ull: use OSC clock during AXI rate change

6 years agoMerge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 19 May 2018 01:02:01 +0000 (18:02 -0700)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A bunch of driver bugfixes and a MAINTAINERS addition"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: add entry for STM32 I2C driver
  i2c: viperboard: return message count on master_xfer success
  i2c: pmcmsp: fix error return from master_xfer
  i2c: pmcmsp: return message count on master_xfer success
  i2c: designware: fix poll-after-enable regression
  eeprom: at24: fix retrieving the at24_chip_data structure
  i2c: core: ACPI: Log device not acking errors at dbg loglevel
  i2c: core: ACPI: Improve OpRegion read errors

6 years agohfsplus: stop workqueue when fill_super() failed
Tetsuo Handa [Fri, 18 May 2018 23:09:16 +0000 (16:09 -0700)]
hfsplus: stop workqueue when fill_super() failed

syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].  This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().

As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work().  Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().

[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9

Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: don't allow deferred pages with NEED_PER_CPU_KM
Pavel Tatashin [Fri, 18 May 2018 23:09:13 +0000 (16:09 -0700)]
mm: don't allow deferred pages with NEED_PER_CPU_KM

It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.

My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.

Below is a more detailed explanation of the problem.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill the
   first section, and lower zones.

2. During mm_init() we initialize "struct pages" for all the memory that
   is allocated, i.e reserved in memblock.

3. Using on-demand logic when pages are allocated after mm_init call
   (when memblock is finished)

4. After smp_init() when the rest free deferred pages are initialized.

The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2.  Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP

The following path exposes the problem:

  start_kernel()
   trap_init()
    setup_cpu_entry_areas()
     setup_cpu_entry_area(cpu)
      get_cpu_gdt_paddr(cpu)
       per_cpu_ptr_to_phys(addr)
        pcpu_addr_to_page(addr)
         virt_to_page(addr)
          pfn_to_page(__pa(addr) >> PAGE_SHIFT)

We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.

The problems are discussed in these threads:
  http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com

Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMAINTAINERS: add Q: entry to kselftest for patchwork project
Shuah Khan (Samsung OSG) [Fri, 18 May 2018 23:09:09 +0000 (16:09 -0700)]
MAINTAINERS: add Q: entry to kselftest for patchwork project

A new patchwork project is created to track kselftest patches.  Update
the kselftest entry in the MAINTAINERS file adding 'Q:' entry:

  https://patchwork.kernel.org/project/linux-kselftest/list/

Link: http://lkml.kernel.org/r/20180515164427.12201-1-shuah@kernel.org
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoradix tree: fix multi-order iteration race
Ross Zwisler [Fri, 18 May 2018 23:09:06 +0000 (16:09 -0700)]
radix tree: fix multi-order iteration race

Fix a race in the multi-order iteration code which causes the kernel to
hit a GP fault.  This was first seen with a production v4.15 based
kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
order 9 PMD DAX entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember for example that
an order 2 entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

We fix this race by fixing the way that skip_siblings() detects sibling
nodes.  Instead of testing against the preceding slot we instead look
for siblings via is_sibling_entry() which compares against the position
of the struct radix_tree_node.slots[] array.  This ensures that sibling
entries are properly identified, even if they are no longer contiguous
with the 'entry' they point to.

Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators")
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>