git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch.  More specifically, they
are:

1) Seven patches to remove family abstraction layer (struct nft_af_info)
   in nf_tables, this simplifies our codebase and it saves us 64 bytes per
   net namespace.

2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
   Abdelsalam.

3) Allow to register iptable_raw table before defragmentation, some
   people do not want to waste cycles on defragmenting traffic that is
   going to be dropped, hence add a new module parameter to enable this
   behaviour in iptables and ip6tables. From Subash Abhinov
   Kasiviswanathan. This patch needed a couple of follow up patches to
   get things tidy from Arnd Bergmann.

4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
   patches for this helper to prepare this change are also part of this
   patch series.

5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
   Sharma.

6) Remove log message that several netfilter subsystems print at
   boot/load time.

7) Restore x_tables module autoloading, that got broken in a previous
   patch to allow singleton NAT hook callback registration per hook
   spot, from Florian Westphal. Moreover, return EBUSY to report that
   the singleton NAT hook slot is already in instead.

8) Several fixes for the new nf_tables flowtable representation,
   including incorrect error check after nf_tables_flowtable_lookup(),
   missing Kconfig dependencies that lead to build breakage and missing
   initialization of priority and hooknum in flowtable object.

9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
   target. This is due to recent updates in the core to shrink the hook
   array size and compile it out if no specific family is enabled via
   .config file. Patch from Florian Westphal.

10) Remove duplicated include header files, from Wei Yongjun.

11) Sparse warning fix for the NFPROTO_INET handling from the core
    due to missing static function definition, also from Wei Yongjun.

12) Restore ICMPv6 Parameter Problem error reporting when
    defragmentation fails, from Subash Abhinov Kasiviswanathan.

13) Remove obsolete owner field initialization from struct
    file_operations, patch from Alexey Dobriyan.

14) Use boolean datatype where needed in the Netfilter codebase, from
    Gustavo A. R. Silva.

15) Remove double semicolon in dynset nf_tables expression, from
    Luis de Bethencourt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-01-19

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) bpf array map HW offload, from Jakub.

2) support for bpf_get_next_key() for LPM map, from Yonghong.

3) test_verifier now runs loaded programs, from Alexei.

4) xdp cpumap monitoring, from Jesper.

5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.

6) user space can now retrieve HW JITed program, from Jiong.

Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

The BPF verifier conflict was some minor contextual issue.

The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-misc-improvements'

Daniel Borkmann says:

====================
This series adds various misc improvements to BPF: detection
of BPF helper definition misconfiguration for mem/size argument
pairs, csum_diff helper also for XDP, various test cases,
removal of the recently added pure_initcall(), restriction
of the jit sysctls to cap_sys_admin for initns, a minor size
improvement for x86 jit in alu ops, output of complexity limit
to verifier log and last but not least having the event output
more flexible with moving to const_size_or_zero type.

Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: move event_output to const_size_or_zero for xdp/skb as well

Similar rationale as in a60dd35d2e39 ("bpf: change bpf_perf_event_output
arg5 type to ARG_CONST_SIZE_OR_ZERO"), change the type to CONST_SIZE_OR_ZERO
such that we can better deal with optimized code. No changes needed in
bpf_event_output() as it can also deal with 0 size entirely (e.g. as only
wake-up signal with empty frame in perf RB, or packet dumps w/o meta data
as another such possibility).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add upper complexity limit to verifier log

Given the limit could potentially get further adjustments in the
future, add it to the log so it becomes obvious what the current
limit is w/o having to check the source first. This may also be
helpful for debugging complexity related issues on kernels that
backport from upstream.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, x86: small optimization in alu ops with imm

For the BPF_REG_0 (BPF_REG_A in cBPF, respectively), we can use
the short form of the opcode as dst mapping is on eax/rax and
thus save a byte per such operation. Added to add/sub/and/or/xor
for 32/64 bit when K immediate is used. There may be more such
low-hanging fruit to add in future as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: restrict access to core bpf sysctls

Given BPF reaches far beyond just networking these days, it was
never intended to allow setting and in some cases reading those
knobs out of a user namespace root running without CAP_SYS_ADMIN,
thus tighten such access.

Also the bpf_jit_enable = 2 debugging mode should only be allowed
if kptr_restrict is not set since it otherwise can leak addresses
to the kernel log. Dump a note to the kernel log that this is for
debugging JITs only when enabled.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: get rid of pure_initcall dependency to enable jits

Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add couple of test cases for div/mod by zero

Add couple of missing test cases for eBPF div/mod by zero to the
new test_verifier prog runtime feature. Also one for an empty prog
and only exit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add couple of test cases for signed extended imms

Add a couple of test cases for interpreter and JIT that are
related to an issue we faced some time ago in Cilium [1],
which is fixed in LLVM with commit e53750e1e086 ("bpf: fix
bug on silently truncating 64-bit immediate").

Test cases were run-time checking kernel to behave as intended
which should also provide some guidance for current or new
JITs in case they should trip over this. Added for cBPF and
eBPF.

[1] https://github.com/cilium/cilium/pull/2162

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add csum_diff helper to xdp as well

Useful for porting cls_bpf programs w/o increasing program
complexity limits much at the same time, so add the helper
to XDP as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, verifier: detect misconfigured mem, size argument pair

I've seen two patch proposals now for helper additions that used
ARG_PTR_TO_MEM or similar in reg_X but no corresponding ARG_CONST_SIZE
in reg_X+1. Verifier won't complain in such case, but it will omit
verifying the memory passed to the helper thus ending up badly.
Detect such buggy helper function signature and bail out during
verification rather than finding them through review.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

samples/bpf: xdp_monitor include cpumap tracepoints in monitoring

The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.

Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints.  Model the output format
like xdp_redirect_cpu.

Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events.  This resembles part
of the builtin monitoring output from sample xdp_rxq_info.  Thus, also
covering part of that sample in an external monitoring tool.

Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead.  Thus, they are enabled by default.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-lpm-get-next-key'

Yonghong Song says:

====================
This patch set implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
This command is really useful for key enumeration, and for key deletion
if what keys in the trie are unknown.

Patch #1 implements the functionality in the kernel and patch #2
adds a test case in tools/testing/selftests/bpf.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: add a testcase for MAP_GET_NEXT_KEY command of LPM_TRIE map

A test case is added in tools/testing/selftests/bpf/test_lpm_map.c
for MAP_GET_NEXT_KEY command. A four node trie, which
is described in kernel/bpf/lpm_trie.c, is built and the
MAP_GET_NEXT_KEY results are checked.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map

Current LPM_TRIE map type does not implement MAP_GET_NEXT_KEY
command. This command is handy when users want to enumerate
keys. Otherwise, a different map which supports key
enumeration may be required to store the keys. If the
map data is sparse and all map data are to be deleted without
closing file descriptor, using MAP_GET_NEXT_KEY to find
all keys is much faster than enumerating all key space.

This patch implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
If user provided key pointer is NULL or the key does not have
an exact match in the trie, the first key will be returned.
Otherwise, the next key will be returned.

In this implemenation, key enumeration follows a postorder
traversal of internal trie. More specific keys
will be returned first than less specific ones, given
a sequence of MAP_GET_NEXT_KEY syscalls.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests: bpf: update .gitignore with missing generated files

Update .gitignore with missing generated files.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: recognize BPF_MAP_TYPE_CPUMAP maps

Add BPF_MAP_TYPE_CPUMAP map type to the list
of map type recognized by bpftool and define
corresponding text representation.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'dsa-mv88e6xxx-ATU-VTU-irq-fixes'

Andrew Lunn says:

====================
ATU and VTU irq fixes

Further testing and code review found two sets of bugs.

Core review found a cut/paste error in the irq setup code.

A board which does not have an interrupt line from the switch to the
SoC, and experiancing an EPROBE_DEFER throw a splat when the ATU irq
was freed but never registered.

v2: Fix typ0 chip->chip->vtu_prob_irq to chip->vtu_prob_irq
0-day compile testing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Free ATU/VTU irq only when there is chip irq

We only register the ATU and VTU irq when we have a chip level IRQ.
In the error path, we should only attempt to remove the ATU and VTU
irq if we also have a chip level IRQ.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Return error from irq_find_mapping()

Fix a cut/paste error. When irq_find_mapping() returns an error for
the ATU or VTU interrupt, return that error, not the value of
chip->device_irq.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-cls-add-extack-support'

Alexander Aring says:

====================
net: sched: cls: add extack support

this patch adds extack support for TC classifier subsystem. The first
patch fixes some code style issues for this patch series pointed out
by checkpatch. The other patches until the last one prepares extack
handling for the TC classifier subsystem and handle generic extack
errors.

The last patch is an example for u32 classifier to add extack support
inside the callbacks delete and change. There exists a init callback as
well, but most classifier implementation run a kalloc() once to allocate
something. Not necessary _yet_ to add extack support now.

- Alex

Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- fix accidentally move of config option mismatch message in PATCH 2/8
   correct one is 4/8, detected by kbuildbot (Thank you)
- Removed patch "net: sched: cls: add extack support for tc_setup_cb_call"
   PATCH 7/8 in version v2 as suggested by Jakub Kicinski (Thank you)
- changed NL_SET_ERR_MSG to NL_SET_ERR_MSG_MOD as suggested by Jakub Kicinski
   in u32 cls (Thank You)
- Removed text from cover letter that I was waiting for Jiri's Patches as
   detected by Jamal Hadi Salim (Thank you).

changes since v2:
- rebased on Jiri's patches (Thank you)
- several spelling fixes pointed out by Cong Wang (Thank you)
- several spelling fixes pointed out by David Ahern (Thank you)
- use David Ahern recommendation if config option is mismatch, but
   combine it with Cong Wang recommendation to put config name into it
   (Thank you)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: add extack support

This patch adds extack support for the u32 classifier as example for
delete and init callback.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for tcf_change_indev

This patch adds extack handling for the tcf_change_indev function which
is common used by TC classifier implementations.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for delete callback

This patch adds extack support for classifier delete callback api. This
prepares to handle extack support inside each specific classifier
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for tcf_exts_validate

The tcf_exts_validate function calls the act api change callback. For
preparing extack support for act api, this patch adds the extack as
parameter for this function which is common used in cls implementations.

Furthermore the tcf_exts_validate will call action init callback which
prepares the TC action subsystem for extack support.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: add extack support for change callback

This patch adds extack support for classifier change callback api. This
prepares to handle extack support inside each specific classifier
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_api: handle generic cls errors

This patch adds extack support for generic cls handling. The extack
will be set deeper to each called function which is not part of netdev
core api.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls: fix code style issues

This patch changes some code style issues pointed out by checkpatch
inside the TC cls subsystem.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Upper-bound supported FW version

During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-devlink-capabilities-extensions-and-updates'

Jakub Kicinski says:

====================
nfp: devlink, capabilities extensions and updates

This series starts with an improvement to the usability of the device
memory accessors (CPP transactions).  Next few patches are devoted to
fixing the devlink locking.  After recent patches for mlxsw the locking
scheme of devlink ops has to be reworked.  Following patches improve
NFP code dealing with "representors", and expands the error message
printed when driver has no support for loaded FW.

Second part of the series is focused on vNIC capabilities read from
vNIC control memory (often referred to as "BAR0" for historical reasons).
TLV capability format is established and immediately made use of.  The
next patches rework parsing of features for control vNIC which allows
apps to mask out features they don't want enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: disable all ctrl vNIC capabilities

BPF firmware currently exposes IRQ moderation capability.
The driver will make use of it by default, inserting 50 usec
delay to every control message exchange. This cuts the number
of messages per second we can exchange by almost half.

None of the other capabilities make much sense for BPF control
vNIC, either. Disable them all.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allow apps to disable ctrl vNIC capabilities

Most vNIC capabilities are netdev related.  It makes no sense
to initialize them and waste FW resources.  Some are even
counter-productive, like IRQ moderation, which will slow
down exchange of control messages.

Add to nfp_app a mask of enabled control vNIC capabilities
for apps to use.  Make flower and BPF enable all capabilities
for now.  No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: split reading capabilities out of nfp_net_init()

nfp_net_init() is a little long and we are about to add more
code to reading capabilties. Move the capability reading,
parsing and validating out. Only actual initialization
will stay in nfp_net_init().

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: read mailbox address from TLV caps

Allow specifying alternative vNIC mailbox location in TLV caps.
This way we can size the mailbox to the needs and not necessarily
waste 512B of ctrl memory space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: read ME frequency from vNIC ctrl memory

PCIe island clock frequency is used when converting coalescing
parameters from usecs to NFP timestamps. Most chips don't run
at 1200MHz, allow FW to provide us with the real frequency.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add TLV capabilities to the BAR

NFP is entirely programmable, including the PCI data interface.
Using a fixed control BAR layout certainly makes implementations
easier, but require careful considerations when space is allocated.
Once BAR area is allocated to one feature nothing else can use it.
Allocating space statically also requires it to be sized upfront,
which leads to either unnecessary limitation or wastage.

We currently have a 32bit capability word defined which tells drivers
which application FW features are supported.   Most of the bits
are exhausted.  The same bits are also reused for enabling specific
features.  Bulk of capabilities don't have a need for an enable bit,
however, leading to confusion and wastage.

TLVs seems like a better fit for expressing capabilities of applications
running on programmable hardware.

This patch leaves the front of the BAR as is, and declares a TLV
capability start at offset 0x58.  Most of the space up to 0x0d90
is already allocated, but the used space can be wrapped with RESERVED
TLVs.  E.g.:

Address    Type         Length
0x0058    RESERVED      0xe00  /* Wrap basic structures */
0x0e5c    FEATURE_A     0x004
0x0e64    FEATURE_B     0x004
0x0e6c    RESERVED      0x990  /* Wrap qeueue stats */
0x1800    FEATURE_C     0x100

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: improve app not found message

When driver app matching loaded FW is not found users are faced with:

nfp: failed to find app with ID 0x%02x

This message does not properly explain that matching driver code is
either not built into the driver or the driver is too old.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: protect each repr pointer individually with RCU

Representors are grouped in sets by type.  Currently the whole
sets are under RCU protection, but individual representor pointers
are not.  This causes some inconveniences when representors have
to be destroyed, because we have to allocate new sets to remove
any representors.  Protect the individual pointers with RCU.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add nfp_reprs_get_locked() helper

The write side of repr tables is always done under pf->lock.
Add a helper to dereference repr table pointers under protection
of that lock.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: register devlink after app is created

Devlink used to have two global locks: devlink lock and port lock,
our lock ordering looked like this:

  devlink lock -> driver's pf->lock -> devlink port lock

After recent changes port lock was replaced with per-instance
lock.  Unfortunately, new per-instance lock is taken on most
operations now.  This means we can only grab the pf->lock from
the port split/unsplit ops.  Lock ordering looks like this:

  devlink lock -> driver's pf->lock -> devlink instance lock

Since we can't take pf->lock from most devlink ops, make sure
nfp_apps are prepared to service them as soon as devlink is
registered.  Locking the pf must be pushed down after
nfp_app_init() callback.

The init order looks like this:
nfp_app_init
devlink_register
nfp_app_start
netdev/port_register

As soon as app_init is done nfp_apps must be ready to service
devlink-related callbacks.  apps can only register their own
devlink objects from nfp_app_start.

Fixes: 2406e7e546b2 ("devlink: Add per devlink instance lock")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: release global resources only on the remove path

NFP app is currently shut down as soon as all the vNICs are gone.
This means we can't depend on the app existing throughout the
lifetime of the device. Free the app only from PCI remove path.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: core: make scalar CPP helpers fail on short accesses

Currently the helpers for accessing 4 or 8 byte values over
the CPP bus return the length of IO on success.  If the IO
was short caller has to deal with error handling.  The short
IO for 4/8B values is completely impractical.  Make the
helpers return an error if full access was not possible.
Fix the few places which are actually dealing with errors
correctly, most call sites already only deal with negative
return codes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-min-rtt'

Yuchung Cheng says:

====================
tcp: do not use RTT from delayed ACKs for min-RTT

This patch set prevents TCP sender from using RTT samples from
(suspected) delayed ACKs as the minimum RTT, to avoid unbounded
over-estimation of the network path delay. This issue is common
when a connection has extended periods of one packet chit-chat
beyond the min RTT filter window. The first patch does that for TCP
general min RTT estimation. The second patch addresses specifically
the BBR congestion control's min RTT filter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR

A persistent connection may send tiny amount of data (e.g. health-check)
for a long period of time. BBR's windowed min RTT filter may only see
RTT samples from delayed ACKs causing BBR to grossly over-estimate
the path delay depending how much the ACK was delayed at the receiver.

This patch skips RTT samples that are likely coming from delayed ACKs. Note
that it is possible the sender never obtains a valid measure to set the
min RTT. In this case BBR will continue to set cwnd to initial window
which seems fine because the connection is thin stream.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: avoid min-RTT overestimation from delayed ACKs

This patch avoids having TCP sender or congestion control
overestimate the min RTT by orders of magnitude. This happens when
all the samples in the windowed filter are one-packet transfer
like small request and health-check like chit-chat, which is farily
common for applications using persistent connections. This patch
tries to conservatively labels and skip RTT samples obtained from
this type of workload.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix race between poll() and setsockopt()

Letting tipc_poll() dereference a socket's pointer to struct tipc_group
entails a race risk, as the group item may be deleted in a concurrent
tipc_sk_join() or tipc_sk_leave() thread.

We now move the 'open' flag in struct tipc_group to struct tipc_sock,
and let the former retain only a pointer to the moved field. This will
eliminate the race risk.

Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove switch block in l2tp_nl_cmd_session_create()

Remove the switch block in l2tp_nl_cmd_session_create() that
checks pseudowire-specific parameters since just L2TP_PWTYPE_ETH and
L2TP_PWTYPE_PPP are currently supported and no actual checks are
performed. Moreover the L2TP_PWTYPE_IP/default case presents a harmless
issue in error handling (break instead of goto out_tunnel)

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: IPv6 filter takes 2 tids

on T6, IPv6 filter would occupy 2 tids instead of 4.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'l2tp-set-l2specific_len-based-on-l2specific_type'

Lorenzo Bianconi says:

====================
l2tp: set l2specific_len based on l2specific_type

Do not rely on l2specific_len value provided by userspace but set sublayer
length according to l2specific_type.
Mark L2TP_ATTR_L2SPEC_LEN attribute as not used

Changes since v2:
- drop the patch related to a fix in the switch default case in
l2tp_nl_cmd_session_create()
- use L2SPECTYPE_NONE as default case in l2tp_get_l2specific_len()

Changes since v1:
- remove l2specific_len parameter
- add sanity check on l2specific_type provided by userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: mark L2TP_ATTR_L2SPEC_LEN as not used

Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove l2specific_len configurable parameter

Remove l2specific_len configuration parameter since now L2-Specific
Sublayer length is computed according to l2specific_type provided by
userspace.

Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove l2specific_len dependency in l2tp_core

Remove l2specific_len dependency while building l2tpv3 header or
parsing the received frame since default L2-Specific Sublayer is
always four bytes long and we don't need to rely on a user supplied
value.
Moreover in l2tp netlink code there are no sanity checks to
enforce the relation between l2specific_len and l2specific_type,
so sending a malformed netlink message is possible to set
l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even
L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than
4 leaking memory on the wire and sending corrupted frames.

Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: double-check l2specific_type provided by userspace

Add sanity check on l2specific_type provided by userspace in
l2tp_nl_cmd_session_create() since just L2TP_L2SPECTYPE_DEFAULT and
L2TP_L2SPECTYPE_NONE are currently supported.
Moreover explicitly set l2specific_type to L2TP_L2SPECTYPE_DEFAULT
only if the userspace does not provide a value for it

Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-reduce-memory-footprint-for-collecting-firmware-dump'

Rahul Lakkireddy says:

====================
cxgb4: reduce memory footprint for collecting firmware dump

Firmware dump can be large (upto 2 GB). In low memory conditions,
ethtool fails to allocate such large memory. So, use zlib deflate
to compress collected firmware dump.

Patch 1 updates collection logic to use compression.

Patch 2 adds zlib deflate to compress collected firmware dump.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: use zlib deflate to compress firmware dump

Use zlib deflate to compress firmware dump. Collect and compress
as much firmware dump as possible into a 32 MB buffer.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: update dump collection logic to use compression

Update firmware dump collection logic to use compression when available.
Let collection logic attempt to do compression, instead of returning out
of memory early.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/dim: Fix fixpoint divide exception in net_dim_stats_compare

Helmut reported a bug about devision by zero while
running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is devision by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'trace-v4.15-rc4-3' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Two more small fixes

   - The conversion of enums into their actual numbers to display in the
     event format file had an off-by-one bug, that could cause an enum
     not to be converted, and break user space parsing tools.

   - A fix to a previous fix to bring back the context recursion checks.
     The interrupt case checks for NMI, IRQ and softirq, but the softirq
     returned the same number regardless if it was set or not, although
     the logic would force it to be set if it were hit"

* tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix converting enum's from the map in trace_event_eval_update()
  ring-buffer: Fix duplicate results in mapping context to bits in recursive lock

devlink: Make some functions static

Fixes the following sparse warnings:

net/core/devlink.c:2297:25: warning:
symbol 'devlink_resource_find' was not declared. Should it be static?
net/core/devlink.c:2322:6: warning:
symbol 'devlink_resource_validate_children' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- a fix for use-after-free in Synaptics RMI4 driver

- correction to multitouch contact tracking on certain ALPS touchpads
   (which got broken when we tried to fix the 2-finger scrolling)

- touchpad on Lenovo T640p is switched over to SMbus/RMI

- a few device node refcount fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics-rmi4 - prevent UAF reported by KASAN
  Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
  Input: synaptics - Lenovo Thinkpad T460p devices should use RMI
  Input: of_touchscreen - add MODULE_LICENSE
  Input: 88pm860x-ts - fix child-node lookup
  Input: twl6040-vibra - fix child-node lookup
  Input: twl4030-vibra - fix sibling-node lookup

mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() static

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: remove unused variable

The variable miistat is not used. So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Two bugfixes for the I2C core: Lixing Wang fixed a refcounting problem
  with DT nodes. Jeremy Compostella fixed a buffer overflow possibility
  when using a 'don't use' ioctl interface directly"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
  i2c: core: decrease reference count of device node in i2c_unregister_device

Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixlet from Tejun Heo:
"This just adds one more entry for liteon optical drives to the device
  blacklist for large IOs.

  The change is very low risk"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"cgroup.threads should be delegatable (ie. a container should be able
  to write to it from inside) but was missing the flag.

  The change is very low risk"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: make cgroup.threads delegatable

Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue fixlet from Tejun Heo:
"One patch to add touch_nmi_watchdog() while dumping workqueue debug
  messages to avoid triggering the lockup detector spuriously.

  The change is very low risk"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: avoid hard lockups in show_workqueue_state()

Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
"We have various small DT fixes, and one important regression fix:

  The recent device tree bugfixes that were intended to address issues
  that 'dtc' started warning about in 4.15 fixed various USB PHY device
  nodes, but it turns out that we had code that depended on those nodes
  being incorrect and the probe failing with a particular error code.
  With the workaround we can also deal with correct device nodes.

  The DT fixes include:

   - Allwinner A10 and A20 had the display pipeline set up incorrectly
     (introduced in v4.15)

   - The Altera PMU lacked an interrupt-parent (never worked)

   - Pin muxing on the Openblocks A7 (never worked)

   - Clocks might get set up wrong on Armada 7K/8K (4.15 regression)

  We now have additional device tree patches to address all the
  remaining warnings introduced in 4.15, but decided to queue them for
  4.16 instead, to avoid risking another regression like the USB PHY
  thing mentioned above.

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  phy: work around 'phys' references to usb-nop-xceiv devices
  ARM: sunxi_defconfig: Enable CMA
  arm64: dts: socfpga: add missing interrupt-parent
  ARM: dts: sun[47]i: Fix display backend 1 output to TCON0 remote endpoint
  ARM64: dts: marvell: armada-cp110: Fix clock resources for various node
  ARM: dts: da850-lcdk: Remove leading 0x and 0s from unit address
  ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7

Merge tag 'powerpc-4.15-8' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"More than we'd like after rc8, but nothing very alarming either, just
  tying up loose ends before the release:

  Since we changed powernv to use cpufreq_get() from show_cpuinfo(), we
  see warnings with PREEMPT enabled. But the preempt_disable() in
  show_cpuinfo() doesn't actually prevent CPU hotplug as it suggests, so
  remove it.

  Two updates to the recently merged RFI flush code. Wire up the generic
  sysfs file to report the status, and add a debugfs file to allow
  enabling/disabling it at runtime.

  Two updates to xmon, one to add the RFI flush related fields to the
  paca dump, and another to not use hashed pointers in the paca dump.

  And one minor fix to add a missing include of linux/types.h in
  asm/hvcall.h, not seen to break the build in upstream, but correct
  anyway.

  Thanks to: Benjamin Herrenschmidt, Michal Suchanek, Nicholas Piggin"

* tag 'powerpc-4.15-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: include linux/types.h in asm/hvcall.h
  powerpc/64s: Allow control of RFI flush via debugfs
  powerpc/64s: Wire up cpu_show_meltdown()
  powerpc: Don't preempt_disable() in show_cpuinfo()
  powerpc/xmon: Don't print hashed pointers in paca dump
  powerpc/xmon: Add RFI flush related fields to paca dump

ipv6: mcast: remove dead code

Since commit 41033f029e39 ("snmp: Remove duplicate OUTMCAST stat
increment") one line of code became unneeded.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Nouveau, i915, vmwgfx and sun4i regression fixes.

  The i915 change fixes a display corruption problem introduced in 4.15,
  the nouveau changes are for regressions in 4.15, one of the vmwgfx
  fixes goes back a little further, the other is a 4.15 regression fix,
  the 3 sun4i changes fix blank HDMI output on those devices"

* tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling
  drm/nouveau/bar/gk20a: Avoid bar teardown during init
  drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle()
  drm/vmwgfx: fix memory corruption with legacy/sou connectors
  drm/vmwgfx: Fix a boot time warning
  drm/i915: Fix deadlock in i830_disable_pipe()
  drm/i915: Redo plane sanitation during readout
  drm/i915: Add .get_hw_state() method for planes
  drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate
  drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
  drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate

caif: reduce stack size with KASAN

When CONFIG_KASAN is set, we can use relatively large amounts of kernel
stack space:

net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=]

This adds convenience wrappers around cfpkt_extr_head(), which is responsible
for most of the stack growth. With those wrapper functions, gcc apparently
starts reusing the stack slots for each instance, thus avoiding the
problem.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  sparse doesn't support struct randomization
  proc: fix coredump vs read /proc/*/stat race
  scripts/gdb/linux/tasks.py: fix get_thread_info
  scripts/decodecode: fix decoding for AArch64 (arm64) instructions
  mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages
  mm/memory.c: release locked page in do_swap_page()

ia64: Rewrite atomic_add and atomic_sub

Force __builtin_constant_p to evaluate whether the argument to atomic_add
& atomic_sub is constant in the front-end before optimisations which
can lead GCC to output a call to __bad_increment_for_ia64_fetch_and_add().

See GCC bugzilla 83653.

Signed-off-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sparse doesn't support struct randomization

Without this patch, I drown in a sea of unknown attribute warnings

Link: http://lkml.kernel.org/r/20180117024539.27354-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: fix coredump vs read /proc/*/stat race

do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).

Steps to reproduce:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <sys/time.h>
    #include <sys/resource.h>
    #include <unistd.h>
    #include <sys/wait.h>

    int main(void)
    {
     setrlimit(RLIMIT_CORE, &(struct rlimit){});

     while (1) {
     char buf[64];
     char buf2[4096];
     pid_t pid;
     int fd;

     pid = fork();
     if (pid == 0) {
     *(volatile int *)0 = 0;
     }

     snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
     fd = open(buf, O_RDONLY);
     read(fd, buf2, sizeof(buf2));
     close(fd);

     waitpid(pid, NULL, 0);
     }
     return 0;
    }

    BUG: unable to handle kernel paging request at 0000000000003fd8
    IP: do_task_stat+0x8b4/0xaf0
    PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
    RIP: 0010:do_task_stat+0x8b4/0xaf0
    Call Trace:
     proc_single_show+0x43/0x70
     seq_read+0xe6/0x3b0
     __vfs_read+0x1e/0x120
     vfs_read+0x84/0x110
     SyS_read+0x3d/0xa0
     entry_SYSCALL_64_fastpath+0x13/0x6c
    RIP: 0033:0x7f4d7928cba0
    RSP: 002b:00007ffddb245158 EFLAGS: 00000246
    Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
    RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8
    CR2: 0000000000003fd8

John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.

Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: "Kohli, Gaurav" <gkohli@codeaurora.org>
Tested-by: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scripts/gdb/linux/tasks.py: fix get_thread_info

Since kernel 4.9, the thread_info has been moved into task_struct, no
longer locates at the bottom of kernel stack.

See commits c65eacbe290b ("sched/core: Allow putting thread_info into
task_struct") and 15f4eae70d36 ("x86: Move thread_info into
task_struct").

Before fix:
  (gdb) set $current = $lx_current()
  (gdb) p $lx_thread_info($current)
  $1 = {flags = 1470918301}
  (gdb) p $current.thread_info
  $2 = {flags = 2147483648}

After fix:
  (gdb) p $lx_thread_info($current)
  $1 = {flags = 2147483648}
  (gdb) p $current.thread_info
  $2 = {flags = 2147483648}

Link: http://lkml.kernel.org/r/20180118210159.17223-1-imxikangjie@gmail.com
Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Xi Kangjie <imxikangjie@gmail.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scripts/decodecode: fix decoding for AArch64 (arm64) instructions

There are a couple of problems with the decodecode script and arm64:

1. AArch64 objdump refuses to disassemble .4byte directives as instructions,
   insisting that they are data values and displaying them as:

a94153f3 .word 0xa94153f3 <-- trapping instruction

   This is resolved by using the .inst directive instead.

2. Disassembly of branch instructions attempts to provide the target as
   an offset from a symbol, e.g.:

   0: 34000082 cbz w2, 10 <.text+0x10>

  however this falls foul of the grep -v, which matches lines containing
  ".text" and ends up removing all branch instructions from the dump.

This patch resolves both issues by using the .inst directive for 4-byte
quantities on arm64 and stripping the resulting binaries (as is done on
arm already) to remove the mapping symbols.

Link: http://lkml.kernel.org/r/1506596147-23630-1-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages

When setting page_owner = on, the following warning can be seen in the
boot log:

  WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:2537 drain_all_pages+0x171/0x1a0
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180109-1-default+ #7
  Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.11.3 11/09/2016
  RIP: 0010:drain_all_pages+0x171/0x1a0
  Call Trace:
    init_page_owner+0x4e/0x260
    start_kernel+0x3e6/0x4a6
    ? set_init_arg+0x55/0x55
    secondary_startup_64+0xa5/0xb0
  Code: c5 ed ff 89 df 48 c7 c6 20 3b 71 82 e8 f9 4b 52 00 3b 05 d7 0b f8 00 89 c3 72 d5 5b 5d 41 5

This warning is shown because we are calling drain_all_pages() in
init_early_allocated_pages(), but mm_percpu_wq is not up yet, it is being
set up later on in kernel_init_freeable() -> init_mm_internals().

Link: http://lkml.kernel.org/r/20180109153921.GA13070@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@techadventures.net>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ayush Mittal <ayush.m@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory.c: release locked page in do_swap_page()

James reported a bug in swap paging-in from his testing. It is that
do_swap_page doesn't release locked page so system hang-up happens due
to a deadlock on PG_locked.

It was introduced by 0bcac06f27d7 ("mm, swap: skip swapcache for swapin
of synchronous device") because I missed swap cache hit places to update
swapcache variable to work well with other logics against swapcache in
do_swap_page.

This patch fixes it.

Debugged by James Bottomley.

Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com>
Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

netfilter: remove messages print and boot/module load time

Several reasons for this:

* Several modules maintain internal version numbers, that they print at
  boot/module load time, that are not exposed to userspace, as a
  primitive mechanism to make revision number control from the earlier
  days of Netfilter.

* IPset shows the protocol version at boot/module load time, instead
  display this via module description, as Jozsef suggested.

* Remove copyright notice at boot/module load time in two spots, the
  Netfilter codebase is a collective development effort, if we would
  have to display copyrights for each contributor at boot/module load
  time for each extensions we have, we would probably fill up logs with
  lots of useless information - from a technical standpoint.

So let's be consistent and remove them all.

Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix BPF divides by zero, from Eric Dumazet and Alexei Starovoitov.

2) Reject stores into bpf context via st and xadd, from Daniel
    Borkmann.

3) Fix a memory leak in TUN, from Cong Wang.

4) Disable RX aggregation on a specific troublesome configuration of
    r8152 in a Dell TB16b dock.

5) Fix sw_ctx leak in tls, from Sabrina Dubroca.

6) Fix program replacement in cls_bpf, from Daniel Borkmann.

7) Fix uninitialized station_info structures in cfg80211, from Johannes
    Berg.

8) Fix miscalculation of transport header offset field in flow
    dissector, from Eric Dumazet.

9) Fix LPM tree leak on failure in mlxsw driver, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  ibmvnic: Fix IPv6 packet descriptors
  ibmvnic: Fix IP offload control buffer
  ipv6: don't let tb6_root node share routes with other node
  ip6_gre: init dev->mtu and dev->hard_header_len correctly
  mlxsw: spectrum_router: Free LPM tree upon failure
  flow_dissector: properly cap thoff field
  fm10k: mark PM functions as __maybe_unused
  cfg80211: fix station info handling bugs
  netlink: reset extack earlier in netlink_rcv_skb
  can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
  can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
  bpf: mark dst unknown on inconsistent {s, u}bounds adjustments
  bpf: fix cls_bpf on filter replace
  Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater than 1500
  tls: reset crypto_info when do_tls_setsockopt_tx fails
  tls: return -EBUSY if crypto_info is already set
  tls: fix sw_ctx leak
  net/tls: Only attach to sockets in ESTABLISHED state
  net: fs_enet: do not call phy_stop() in interrupts
  r8152: disable RX aggregation on Dell TB16 dock
  ...

Merge tag 'linux-can-next-for-4.16-20180119' of ssh://gitolite./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2018-01-16

this is a pull request for net-next/master consisting of 1 patch.

This patch by Arnd Bergmann for the m_can driver silences a compiler
warning if CONFIG_PM is not selected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2018-01-19' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.16

Final few patches before the merge window, nothing really special.

ath9k

* add MSI support (not enabled by default yet)

rtlwifi

* support A-MSDU in A-MPDU aggregation
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phy: work around 'phys' references to usb-nop-xceiv devices

Stefan Wahren reports a problem with a warning fix that was merged
for v4.15: we had lots of device nodes with a 'phys' property pointing
to a device node that is not compliant with the binding documented in
Documentation/devicetree/bindings/phy/phy-bindings.txt

This generally works because USB HCD drivers that support both the generic
phy subsystem and the older usb-phy subsystem ignore most errors from
phy_get() and related calls and then use the usb-phy driver instead.

However, it turns out that making the usb-nop-xceiv device compatible with
the generic-phy binding changes the phy_get() return code from -EINVAL to
-EPROBE_DEFER, and the dwc2 usb controller driver for bcm2835 now returns
-EPROBE_DEFER from its probe function rather than ignoring the failure,
breaking all USB support on raspberry-pi when CONFIG_GENERIC_PHY is
enabled. The same code is used in the dwc3 driver and the usb_add_hcd()
function, so a reasonable assumption would be that many other platforms
are affected as well.

I have reviewed all the related patches and concluded that "usb-nop-xceiv"
is the only USB phy that is affected by the change, and since it is by far
the most commonly referenced phy, all the other USB phy drivers appear
to be used in ways that are are either safe in DT (they don't use the
'phys' property), or in the driver (they already ignore -EPROBE_DEFER
from generic-phy when usb-phy is available).

To work around the problem, this adds a special case to _of_phy_get()
so we ignore any PHY node that is compatible with "usb-nop-xceiv",
as we know that this can never load no matter how much we defer. In the
future, we might implement a generic-phy driver for "usb-nop-xceiv"
and then remove this workaround.

Since we generally want older kernels to also want to work with the
fixed devicetree files, it would be good to backport the patch into
stable kernels as well (3.13+ are possibly affected), even though they
don't contain any of the patches that may have caused regressions.

Fixes: 014d6da6cb25 ARM: dts: bcm283x: Fix DTC warnings about missing phy-cells
Fixes: c5bbf358b790 arm: dts: nspire: Add missing #phy-cells to usb-nop-xceiv
Fixes: 44e5dced2ef6 arm: dts: marvell: Add missing #phy-cells to usb-nop-xceiv
Fixes: f568f6f554b8 ARM: dts: omap: Add missing #phy-cells to usb-nop-xceiv
Fixes: d745d5f277bf ARM: dts: imx51-zii-rdu1: Add missing #phy-cells to usb-nop-xceiv
Fixes: 915fbe59cbf2 ARM: dts: imx: Add missing #phy-cells to usb-nop-xceiv
Link: https://marc.info/?l=linux-usb&m=151518314314753&w=2
Link: https://patchwork.kernel.org/patch/10158145/
Cc: stable@vger.kernel.org
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Eric Anholt <eric@anholt.net>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: sunxi_defconfig: Enable CMA

The DRM driver most notably, but also out of tree drivers (for now) like
the VPU or GPU drivers, are quite big consumers of large, contiguous memory
buffers. However, the sunxi_defconfig doesn't enable CMA in order to
mitigate that, which makes them almost unusable.

Enable it to make sure it somewhat works.

Cc: <stable@vger.kernel.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

netfilter: nf_tables: set flowtable priority and hooknum field

Otherwise netlink dump sends uninitialized fields to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: delete /proc THIS_MODULE references

/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

-               if (de->proc_fops)
-                       inode->i_fop = de->proc_fops;
+               if (de->proc_fops) {
+                       if (S_ISREG(inode->i_mode))
+                               inode->i_fop = &proc_reg_file_ops;
+                       else
+                               inode->i_fop = de->proc_fops;
+               }

VFS stopped pinning module at this point.

# ipvs
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: Fix trailing semicolon

The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: return booleans instead of integers

Return statements in functions returning bool should use
true/false instead of 1/0.

These issues were detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allocate handle and delete objects via handle

This patch allows deletion of objects via unique handle which can be
listed via '-a' option.

Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_snmp_basic: use asn1 decoder library

The basic SNMP ALG parse snmp ASN.1 payload
however, since 2012 linux kernel provide ASN.1 decoder library.
If we use ASN.1 decoder in the /lib/asn1_decoder.c, we can remove
about 1000 line of ASN.1 parsing routine.

To use asn1_decoder.c, we should write mib file(nf_nat_snmp_basic.asn1)
then /script/asn1_compiler.c makes *-asn1.c and *-asn1.h file
at the compiletime.(nf_nat_snmp_basic-asn1.c, nf_nat_snmp_basic-asn1.h)
The nf_nat_snmp_basic.asn1 is made by RFC1155, RFC1157, RFC1902, RFC1905,
RFC2578, RFC3416. of course that mib file supports only the basic SNMP ALG.

Previous SNMP ALG mangles only first octet of IPv4 address.
but after this patch, the SNMP ALG mangles whole IPv4 Address.
And SNMPv3 is not supported.

I tested with snmp commands such ans snmpd, snmpwalk, snmptrap.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_snmp_basic: use nf_ct_helper_log

Use nf_ct_helper_log to write log message.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_snmp_basic: replace ctinfo with dir.

The snmp_translate() receives ctinfo data to get dir value only.
because of caller already has dir value, we just replace ctinfo with dir.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_snmp_basic: remove debug parameter

To see debug message of nf_nat_snmp_basic, we should set debug value
when we insert this module. but it is inconvenient and only using of
the dynamic debugging is enough to debug.

This patch just removes debug code. then in the next patch, debugging code
will be added.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_snmp_basic: remove useless comment

Remove comments that do not let us know important information.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

can: m_can: mark runtime-PM handlers as __maybe_unused

Building without CONFIG_PM results in a harmless warning:

drivers/net/can/m_can/m_can.c:1763:12: error: 'm_can_runtime_resume' defined but not used [-Werror=unused-function]
drivers/net/can/m_can/m_can.c:1752:12: error: 'm_can_runtime_suspend' defined but not used [-Werror=unused-function]

Marking the functions as __maybe_unused lets the compiler
silently drop them instead.

Fixes: cdf8259d6573 ("can: m_can: Add PM Support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Display corruption regression bugfix with both a prep patch and a
follow-up fix

* tag 'drm-intel-fixes-2018-01-18' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915: Fix deadlock in i830_disable_pipe()
  drm/i915: Redo plane sanitation during readout
  drm/i915: Add .get_hw_state() method for planes

ibmvnic: Fix IPv6 packet descriptors

Packet descriptor generation for IPv6 is broken.
Properly set L3 and L4 protocol flags for IPv6 descriptors.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>