git.openwrt.org Git - openwrt/staging/blogic.git/log

tools/bpf: fix a selftest test_btf failure

Commit 9c651127445c ("selftests/btf: add initial BTF dedup tests")
added dedup tests in test_btf.c.
It broke the raw test:
BTF raw test[71] (func proto (Bad arg name_off)):
    btf_raw_create:2905:FAIL Error getting string #65535, strs_cnt:1

The test itself encodes invalid func_proto parameter name
offset 0xffffFFFF as a negative test for the kernel.
The above commit changed the meaning of that offset and
resulted in a user space error.
  #define NAME_NTH(N) (0xffff0000 | N)
  #define IS_NAME_NTH(X) ((X & 0xffff0000) == 0xffff0000)
  #define GET_NAME_NTH_IDX(X) (X & 0x0000ffff)

Currently, the kernel permits maximum name offset 0xffff.
Set the test name off as 0x0fffFFFF to trigger the kernel
verification failure.

Cc: Andrii Nakryiko <andriin@fb.com>
Fixes: 9c651127445c ("selftests/btf: add initial BTF dedup tests")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-riscv-jit'

Björn Töpel says:

====================
This v2 series adds an RV64G BPF JIT to the kernel.

At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES (Patrick Stählin sent out an RFC last year), which
means that CONFIG_BPF_EVENTS is not supported. Thus, no tests
involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT,
BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes.

The implementation does not support "far branching" (>4KiB).

Test results:
  # modprobe test_bpf
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier
  ...
  Summary: 761 PASSED, 507 SKIPPED, 2 FAILED

Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  0

No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  59

The two failing test_verifier tests are:
  "ld_abs: vlan + abs, test 1"
  "ld_abs: jump around ld_abs"

This is due to that "far branching" involved in those tests.
All tests where done on QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106). I'll test it on real hardware, when I get
access to it.

I'm routing this patch via bpf-next/netdev mailing list (after a
conversation with Palmer at FOSDEM), mainly because the other JITs
went that path.

Again, thanks for all the comments!

Cheers,
Björn

v1 -> v2:
* Added JMP32 support. (Daniel)
* Add RISC-V to Documentation/sysctl/net.txt. (Daniel)
* Fixed seen_call() asymmetry. (Daniel)
* Fixed broken bpf_flush_icache() range. (Daniel)
* Added alignment annotations to some selftests.

RFCv1 -> v1:
* Cleaned up the Kconfig and net/Makefile. (Christoph)
* Removed the entry-stub and squashed the build/config changes to be
  part of the JIT implementation. (Christoph)
* Simplified the register tracking code. (Daniel)
* Removed unused macros. (Daniel)
* Added myself as maintainer and updated documentation. (Daniel)
* Removed HAVE_EFFICIENT_UNALIGNED_ACCESS. (Christoph, Palmer)
* Added tail-calls and cleaned up the code.
====================

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add "any alignment" annotation for some tests

RISC-V does, in-general, not have "efficient unaligned access". When
testing the RISC-V BPF JIT, some selftests failed in the verification
due to misaligned access. Annotate these tests with the
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS flag.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, doc: add RISC-V JIT to BPF documentation

Update Documentation/networking/filter.txt and
Documentation/sysctl/net.txt to mention RISC-V.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

MAINTAINERS: add RISC-V BPF JIT maintainer

Add Björn Töpel as RISC-V BPF JIT maintainer.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, riscv: add BPF JIT for RV64G

This commit adds a BPF JIT for RV64G.

The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).

At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not
supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and
BPF_PROG_TYPE_RAW_TRACEPOINT passes.

The implementation does not support "far branching" (>4KiB).

Test results:
  # modprobe test_bpf
  test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier
  ...
  Summary: 761 PASSED, 507 SKIPPED, 2 FAILED

Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  0

No CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS:
  # echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled
  # ./test_verifier | grep -c 'NOTE.*unknown align'
  59

The two failing test_verifier tests are:
  "ld_abs: vlan + abs, test 1"
  "ld_abs: jump around ld_abs"

This is due to that "far branching" involved in those tests.

All tests where done on QEMU (QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106)).

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-btf-dedup'

Andrii Nakryiko says:

====================
This patch series adds BTF deduplication algorithm to libbpf. This algorithm
allows to take BTF type information containing duplicate per-compilation unit
information and reduce it to equivalent set of BTF types with no duplication without
loss of information. It also deduplicates strings and removes those strings that
are not referenced from any BTF type (and line information in .BTF.ext section,
if any).

Algorithm also resolves struct/union forward declarations into concrete BTF types
across multiple compilation units to facilitate better deduplication ratio. If
undesired, this resolution can be disabled through specifying corresponding options.

When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
only 29247 out of initial 3073497 BTF type descriptors.

Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
support is also described more verbosely at:

https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

v1->v2:
- rebase on latest bpf-next
- err_log/elog -> pr_debug
- btf__dedup, btf__get_strings, btf__get_nr_types listed under 0.0.2 version
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/btf: add initial BTF dedup tests

This patch sets up a new kind of tests (BTF dedup tests) and tests few aspects of
BTF dedup algorithm. More complete set of tests will come in follow up patches.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

btf: add BTF types deduplication algorithm

This patch implements BTF types deduplication algorithm. It allows to
greatly compress typical output of pahole's DWARF-to-BTF conversion or
LLVM's compilation output by detecting and collapsing identical types emitted in
isolation per compilation unit. Algorithm also resolves struct/union forward
declarations into concrete BTF types representing referenced struct/union. If
undesired, this resolution can be disabled through specifying corresponding options.

Algorithm itself and its application to Linux kernel's BTF types is
described in details at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

btf: extract BTF type size calculation

This pre-patch extracts calculation of amount of space taken by BTF type descriptor
for later reuse by btf_dedup functionality.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: fix libbpf_print

With the recent print rework we now have the following problem:
pr_{warning,info,debug} expand to __pr which calls libbpf_print.
libbpf_print does va_start and calls __libbpf_pr with va_list argument.
In __base_pr we again do va_start. Because the next argument is a
va_list, we don't get correct pointer to the argument (and print noting
in my case, I don't know why it doesn't crash tbh).

Fix this by changing libbpf_print_fn_t signature to accept va_list and
remove unneeded calls to va_start in the existing users.

Alternatively, this can we solved by exporting __libbpf_pr and
changing __pr macro to (and killing libbpf_print):
{
if (__libbpf_pr)
__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
}

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'libbpf-btf_ext'

Yonghong Song says:

====================
This patch set exposed a few functions in libbpf.
All these newly added API functions are helpful for
JIT based bpf compilation where .BTF and .BTF.ext
are available as in-memory data blobs.

Patch #1 exposed several btf_ext__* API functions which
are used to handle .BTF.ext ELF sections.
Patch #2 refactored the function bpf_map_find_btf_info()
and exposed API function btf__get_map_kv_tids() to
retrieve the map key/value type id's generated by
bpf program through BPF_ANNOTATE_KV_PAIR macro.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: implement libbpf btf__get_map_kv_tids() API function

Currently, to get map key/value type id's, the macro
BPF_ANNOTATE_KV_PAIR(<map_name>, <key_type>, <value_type>)
needs to be defined in the bpf program for the
corresponding map.

During program/map loading time,
the local static function bpf_map_find_btf_info()
in libbpf.c is implemented to retrieve the key/value
type ids given the map name.

The patch refactored function bpf_map_find_btf_info()
to create an API btf__get_map_kv_tids() which includes
the bulk of implementation for the original function.
The API btf__get_map_kv_tids() can be used by bcc,
a JIT based bpf compilation system, which uses the
same BPF_ANNOTATE_KV_PAIR to record map key/value types.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: expose functions btf_ext__* as API functions

The following set of functions, which manipulates .BTF.ext
section, are exposed as API functions:
  . btf_ext__new
  . btf_ext__free
  . btf_ext__reloc_func_info
  . btf_ext__reloc_line_info
  . btf_ext__func_info_rec_size
  . btf_ext__line_info_rec_size

These functions are useful for JIT based bpf codegen, e.g.,
bcc, to manipulate in-memory .BTF.ext sections.

The signature of function btf_ext__reloc_func_info()
is also changed to be the same as its definition in btf.c.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: use localhost in tcp_{server,client}.py

Bind and connect to localhost. There is no reason for this test to
use non-localhost interface. This lets us run this test in a network
namespace.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

s390: bpf: fix JMP32 code-gen

Commit 626a5f66da0d19 ("s390: bpf: implement jitting of JMP32") added
JMP32 code-gen support for s390. However it triggers the warning below
due to some unusual gotos in the original s390 bpf jit code.

Add a couple of additional "is_jmp32" initializations to fix this.
Also fix the wrong opcode for the "llilf" instruction that was
introduced with the same commit.

arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_insn':
arch/s390/net/bpf_jit_comp.c:248:55: warning: 'is_jmp32' may be used uninitialized in this function [-Wmaybe-uninitialized]
  _EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), op2 | mask); \
                                                       ^
arch/s390/net/bpf_jit_comp.c:1211:8: note: 'is_jmp32' was declared here
   bool is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;

Fixes: 626a5f66da0d19 ("s390: bpf: implement jitting of JMP32")
Cc: Jiong Wang <jiong.wang@netronome.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'change-libbpf-print-api'

Yonghong Song says:

====================
These are patches responding to my comments for
Magnus's patch (https://patchwork.ozlabs.org/patch/1032848/).
The goal is to make pr_* macros available to other C files
than libbpf.c, and to simplify API function libbpf_set_print().

Specifically, Patch #1 used global functions
to facilitate pr_* macros in the header files so they
are available in different C files.
Patch #2 removes the global function libbpf_print_level_available()
which is added in Patch 1.
Patch #3 simplified libbpf_set_print() which takes only one print
function with a debug level argument among others.

Changelogs:
v3 -> v4:
   . rename libbpf internal header util.h to libbpf_util.h
   . rename libbpf internal function libbpf_debug_print() to libbpf_print()
v2 -> v3:
   . bailed out earlier in libbpf_debug_print() if __libbpf_pr is NULL
   . added missing LIBBPF_DEBUG level check in libbpf.c __base_pr().
v1 -> v2:
   . Renamed global function libbpf_dprint() to libbpf_debug_print()
     to be more expressive.
   . Removed libbpf_dprint_level_available() as it is used only
     once in btf.c and we can remove it by optimizing for common cases.
====================

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: simplify libbpf API function libbpf_set_print()

Currently, the libbpf API function libbpf_set_print()
takes three function pointer parameters for warning, info
and debug printout respectively.

This patch changes the API to have just one function pointer
parameter and the function pointer has one additional
parameter "debugging level". So if in the future, if
the debug level is increased, the function signature
won't change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: print out btf log at LIBBPF_WARN level

Currently, the btf log is allocated and printed out in case
of error at LIBBPF_DEBUG level.
Such logs from kernel are very important for debugging.
For example, bpf syscall BPF_PROG_LOAD command can get
verifier logs back to user space. In function load_program()
of libbpf.c, the log buffer is allocated unconditionally
and printed out at pr_warning() level.

Let us do the similar thing here for btf. Allocate buffer
unconditionally and print out error logs at pr_warning() level.
This can reduce one global function and
optimize for common situations where pr_warning()
is activated either by default or by user supplied
debug output function.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: move libbpf pr_* debug print functions to headers

A global function libbpf_print, which is invisible
outside the shared library, is defined to print based
on levels. The pr_warning, pr_info and pr_debug
macros are moved into the newly created header
common.h. So any .c file including common.h can
use these macros directly.

Currently btf__new and btf_ext__new API has an argument getting
__pr_debug function pointer into btf.c so the debugging information
can be printed there. This patch removed this parameter
from btf__new and btf_ext__new and directly using pr_debug in btf.c.

Another global function libbpf_print_level_available, also
invisible outside the shared library, can test
whether a particular level debug printing is
available or not. It is used in btf.c to
test whether DEBUG level debug printing is availabl or not,
based on which the log buffer will be allocated when loading
btf to the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

socket: fix for Add SO_TIMESTAMP[NS]_NEW

Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice.h: Add __cold to netdev_<level> logging functions

Add __cold to the netdev_<level> logging functions similar to
the use of __cold in the generic printk function.

Using __cold moves all the netdev_<level> logging functions
out-of-line possibly improving code locality and runtime
performance.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix fall through warning in y2038 tstamp changes.

net/core/sock.c: In function 'sock_setsockopt':
net/core/sock.c:914:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   sock_set_flag(sk, SOCK_TSTAMP_NEW);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/sock.c:915:2: note: here
  case SO_TIMESTAMPING_OLD:
  ^~~~

Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpfilter: remove extra header search paths for bpfilter_umh

Currently, the header search paths -Itools/include and
-Itools/include/uapi are not used. Let's drop the unused code.

We can remove -I. too by fixing up one C file.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-aquantia-improvements'

Heiner Kallweit says:

====================
net: phy: aquantia: number of improvements

This patch series is based on work from Andrew. I adjusted and added
certain parts. The series improves few aspects of driver, no functional
change intended.

v2:
- add my SoB to patch 1
- leave kernel.h in in patch 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: aquantia: replace magic numbers with constants

Replace magic numbers with proper constants. The original patch is
from Andrew, I extended / adjusted certain parts:
- Use decimal bit numbers. The datasheet uses hex bit numbers 0 .. F.
- Order defines from highest to lowest bit numbers
- correct some typos
- add constant MDIO_AN_TX_VEND_INT_MASK2_LINK
- Remove few functional improvements from the patch, they will come as
a separate patch.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: aquantia: use macro PHY_ID_MATCH_MODEL

Make use of macro PHY_ID_MATCH_MODEL to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: aquantia: remove unneeded includes

Remove unneeded header includes.

v2:
- leave kernel.h in

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: aquantia: Shorten name space prefix to aqr_

aquantia_ as a name space prefix is rather long, resulting in lots of
lines needing wrapping, reducing readability. Use the prefix aqr_
instead, which fits with the vendor naming there devices aqr107, for
example.

v2:
- add SoB from Heiner

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix ip_mc_{dec,inc}_group allocation context

After 4effd28c1245 ("bridge: join all-snoopers multicast address"), I
started seeing the following sleep in atomic warnings:

[   26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421
[   26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh
[   26.777855] INFO: lockdep is turned off.
[   26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20
[   26.787943] Hardware name: BCM97278SV (DT)
[   26.792118] Call trace:
[   26.794645]  dump_backtrace+0x0/0x170
[   26.798391]  show_stack+0x24/0x30
[   26.801787]  dump_stack+0xa4/0xe4
[   26.805182]  ___might_sleep+0x208/0x218
[   26.809102]  __might_sleep+0x78/0x88
[   26.812762]  kmem_cache_alloc_trace+0x64/0x28c
[   26.817301]  igmp_group_dropped+0x150/0x230
[   26.821573]  ip_mc_dec_group+0x1b0/0x1f8
[   26.825585]  br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190
[   26.831704]  br_multicast_toggle+0x78/0xcc
[   26.835887]  store_bridge_parm+0xc4/0xfc
[   26.839894]  multicast_snooping_store+0x3c/0x4c
[   26.844517]  dev_attr_store+0x44/0x5c
[   26.848262]  sysfs_kf_write+0x50/0x68
[   26.852006]  kernfs_fop_write+0x14c/0x1b4
[   26.856102]  __vfs_write+0x60/0x190
[   26.859668]  vfs_write+0xc8/0x168
[   26.863059]  ksys_write+0x70/0xc8
[   26.866449]  __arm64_sys_write+0x24/0x30
[   26.870458]  el0_svc_common+0xa0/0x11c
[   26.874291]  el0_svc_handler+0x38/0x70
[   26.878120]  el0_svc+0x8/0xc

while toggling the bridge's multicast_snooping attribute dynamically.

Pass a gfp_t down to igmpv3_add_delrec(), introduce
__igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t
argument.

Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to
allow caller to specify gfp_t.

IPv6 part of the patch appears fine.

Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: devlink: report cell size of shared buffers

Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-y2038-safe-socket-timestamps'

Deepa Dinamani says:

====================
net: y2038-safe socket timestamps

The series introduces new socket timestamps that are
y2038 safe.

The time data types used for the existing socket timestamp
options: SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING
are not y2038 safe. The series introduces SO_TIMESTAMP_NEW,
SO_TIMESTAMPNS_NEW and SO_TIMESTAMPING_NEW to replace these.
These new timestamps can be used on all architectures.

The alternative considered was to extend the sys_setsockopt()
by using the flags. We did not receive any strong opinions about
either of the approaches. Hence, this was chosen, as glibc folks
preferred this.

The series does not deal with updating the internal kernel socket
calls like rxrpc to make them y2038 safe. This will be dealt
with separately.

Note that the timestamps behavior already does not match the
man page specific behavior:
SIOCGSTAMP
This ioctl should only be used if the socket option SO_TIMESTAMP
is not set on the socket. Otherwise, it returns the timestamp of
the last packet that was received while SO_TIMESTAMP was not set,
or it fails if no such packet has been received,
(i.e., ioctl(2) returns -1 with errno set to ENOENT).

The recommendation is to update the man page to remove the above statement.

The overview of the socket timestamp series is as below:
1. Delete asm specific socket.h when possible.
2. Support SO/SCM_TIMESTAMP* options only in userspace.
3. Rename current SO/SCM_TIMESTAMP* to SO/SCM_TIMESTAMP*_OLD.
3. Alter socket options so that SOCK_RCVTSTAMPNS does
not rely on SOCK_RCVTSTAMP.
4. Introduce y2038 safe types for socket timestamp.
5. Introduce new y2038 safe socket options SO/SCM_TIMESTAMP*_NEW.
6. Intorduce new y2038 safe socket timeout options.

Changes since v4:
* Fixed the typo in calling sock_get_timeout()

Changes since v3:
* Rebased onto net-next and fixups as per review comments
* Merged the socket timeout series
* Integrated Arnd's patch to simplify compat handling of timeout syscalls

Changes since v2:
* Removed extra functions to reduce diff churn as per code review

Changes since v1:
* Dropped the change to disentangle sock flags
* Renamed sock_timeval to __kernel_sock_timeval
* Updated a few comments
* Added documentation changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW

Add new socket timeout options that are y2038 safe.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: davem@davemloft.net
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
as the time format. struct timeval is not y2038 safe.
The subsequent patches in the series add support for new socket
timeout options with _NEW suffix that will use y2038 safe
data structures. Although the existing struct timeval layout
is sufficiently wide to represent timeouts, because of the way
libc will interpret time_t based on user defined flag, these
new flags provide a way of having a structure that is the same
for all architectures consistently.
Rename the existing options with _OLD suffix forms so that the
right option is enabled for userspace applications according
to the architecture and time_t definition of libc.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: ccaulfie@redhat.com
Cc: deller@gmx.de
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: cluster-devel@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Update timestamping Documentation

With the new y2038 safe timestamping options added, update the
documentation to reflect the changes.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Add SO_TIMESTAMPING_NEW

Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Add SO_TIMESTAMP[NS]_NEW

Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
socket timestamp options.
These are the y2038 safe versions of the SO_TIMESTAMP_OLD
and SO_TIMESTAMPNS_OLD for all architectures.

Note that the format of scm_timestamping.ts[0] is not changed
in this patch.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Add struct __kernel_sock_timeval

The new type is meant to be used as a y2038 safe structure
to be used as part of cmsg data.
Presently the SO_TIMESTAMP socket option uses struct timeval
for timestamps. This is not y2038 safe.
Subsequent patches in the series add new y2038 safe socket
option to be used in the place of SO_TIMESTAMP_OLD.
struct __kernel_sock_timeval will be used as the timestamp
format at that time.

struct __kernel_sock_timeval also maintains the same layout
across 32 bit and 64 bit ABIs.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: Use old_timeval types for socket timestamps

As part of y2038 solution, all internal uses of
struct timeval are replaced by struct __kernel_old_timeval
and struct compat_timeval by struct old_timeval32.
Make socket timestamps use these new types.

This is mainly to be able to verify that the kernel build
is y2038 safe when such non y2038 safe types are not
supported anymore.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: isdn@linux-pingi.de
Signed-off-by: David S. Miller <davem@davemloft.net>

arch: sparc: Override struct __kernel_old_timeval

struct __kernel_old_timeval is supposed to have the same
layout as struct timeval. But, it was inadvarently missed
that __kernel_suseconds has a different definition for
sparc64.
Provide an asm-specific override that fixes it.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD

SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
way they are currently defined, are not y2038 safe.
Subsequent patches in the series add new y2038 safe versions
of these options which provide 64 bit timestamps on all
architectures uniformly.
Hence, rename existing options with OLD tag suffixes.

Also note that kernel will not use the untagged SO_TIMESTAMP*
and SCM_TIMESTAMP* options internally anymore.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: deller@gmx.de
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-afs@lists.infradead.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

arch: Use asm-generic/socket.h when possible

Many architectures maintain an arch specific copy of the
file even though there are no differences with the asm-generic
one. Allow these architectures to use the generic one instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: tglx@linutronix.de
Cc: schwidefsky@de.ibm.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: move compat timeout handling into sock.c

This is a cleanup to prepare for the addition of 64-bit time_t
in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
unnecessarily complex and error-prone, moving it all into the
main setsockopt()/getsockopt() implementation requires half
as much code and is easier to extend.

32-bit user space can now use old_timeval32 on both 32-bit
and 64-bit machines, while 64-bit code can use
__old_kernel_timeval.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: add missing include unistd

Compiling rxtimestamp.c generates error messages due to
non-existing declaration for write() library call.

Add missing unistd.h include to provide the declaration and
silence the error.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/cxgb4vf: Program hash region for {t4/t4vf}_change_mac()

{t4/t4_vf}_change_mac() API's were only doing additions to MPS_TCAM.
This will fail, when the number of tcam entries is limited particularly
in vf's.
This fix programs hash region with the mac address, when TCAM
addtion fails for {t4/t4vf}_change_mac(). Since the locally maintained
driver list for hash entries is shared across mac_{sync/unsync}(),
added an extra parameter if_mac to track the address added thorugh
{t4/t4vf}_change_mac()

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4/igmp: Don't drop IGMP pkt with zeros src addr

Don't drop IGMP packets with a source address of all zeros which are
IGMP proxy reports. This is documented in Section 2.1.1 IGMP
Forwarding Rules of RFC 4541 IGMP and MLD Snooping Switches
Considerations.

Signed-off-by: Edward Chron <echron@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: realtek: add generic Realtek PHY driver

The integrated PHY's of later RTL8168 network chips report the generic
PHYID 0x001cc800 (Realtek OUI, model and revision number both set to
zero) and therefore currently the genphy driver is used.

To be able to use the paged version of e.g. phy_write() we need a
PHY driver with the read_page and write_page callbacks implemented.
So basically make a copy of the genphy driver, just with the
read_page and write_page callbacks being set.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atheros: atl2: fix an indentaion issue on a return statement

A return statment is not indented correctly, fix this by adding an
extra tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: fix indentation issue on an if statement

An if statement is indented one level too deep, fix this by removing
the extra tabs. Also add some spaces to the dev_warn arguments to clean
up checkpatch warnings.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bna: fix indentation issue on call to bfa_ioc_pf_failed

The call to bfa_ioc_pf_failed is indented too far, fix this by
removing a tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

chelsio: clean up indentation issue

The assignment to size is indented too far, fix this and join
two lines into one.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nixge: Update device-tree bindings with v3.00

Now the DMA engine is free to float elsewhere in the system map.

Signed-off-by: Alex Williams <alex.williams@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: nixge: Separate ctrl and dma resources

The DMA engine is a separate entity altogether, and this allows the DMA
controller's address to float elsewhere in the FPGA's map.

Signed-off-by: Alex Williams <alex.williams@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove rtl_wol_pll_power_down

rtl_wol_pll_power_down() is used in only one place and removing it
makes the code simpler and better readable.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: MAC table entry count function increases operation 0 value protection measures

When updating the available MAC VLAN table counts,
MAC VLAN table entry count function adds
operation 0 value protection measures.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: modify the upper limit judgment condition

In order to prevent the variable anomaly from being larger than desc_num,
the upper limit judgment condition becomes >=.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: don't allow user to change vlan filter state

When user disables vlan filter, and adds vlan device, it won't
notify the driver the update the vlan filter. In this case, when
user enables vlan filter again, the packets with new vlan tag
will be filtered by vlan filter.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: optimize the maximum TC macro

Multiple macros with the largest number of TCs in the system,
optimized to HCLGE_MAX_TC_NUM.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix the problem that the supported port is empty

Run ethtool ethx when displaying device information in VF，
the supported port and link mode items will be empty.

This patch fixes it.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix a wrong checking in the hclge_tx_buffer_calc()

Only the TC is enabled, we need to check whether the buffer is enough,
otherwise it may lead to a wrong -ENOMEM case.

Fixes: 9ffe79a9c2ee ("net: hns3: Support for dynamically assigning tx buffer to TC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: move some set_bit statement into hclge_prepare_mac_addr

This patch does not change the code logic. There are some same
set_bit statements called by add/rm_uc/mc_addr_common, and move
this statements into hclge_prepare_mac_addr to reduce duplicate
code.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add hclge_cmd_check_retval() to parse comman's return value

For simplifying the code, this patch adds hclge_cmd_check_retval() to
check the return value of the command.

Also, according the IMP's description, when there are several descriptors
in a command, then the IMP will save the return value on the last
description, so hclge_cmd_check_retval() just check the last one for this
case.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: code optimization for hclge_rx_buffer_calc

There are four steps to calcuate the rx private buffer, each step
can be done in a function to avoid code duplication and aid code
readability.

This patch adds three separate functions do the job. Also, the
function name more or less make the comment redundant, so remove
some obvious comment.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Modify parameter type from int to bool in set_gro_en

The second parameter to the hook function set_gro_en is always passed in
true/false, so modify it's type from int to bool.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix an issue for hns3_update_new_int_gl

HNS3 supports setting rx-usecs|tx-usecs as 0, but it will not
update dynamically when adaptive-tx or adaptive-rx is enable.
This patch removes the Redundant check.

Fixes: a95e1f8666e9 ("net: hns3: change the time interval of int_gl calculating")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix a code style issue for hns3_update_new_int_gl()

Use the same code style for rx_group and tx_group in the
hns3_update_new_int_gl().

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-02-01

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) introduce bpf_spin_lock, from Alexei.

2) convert xdp samples to libbpf, from Maciej.

3) skip verifier tests for unsupported program/map types, from Stanislav.

4) powerpc64 JIT support for BTF line info, from Sandipan.

5) assorted fixed, from Valdis, Jesper, Jiong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'shifts-cleanup'

Jiong Wang says:

====================
NFP JIT back-end is missing several ALU32 logic shifts support.

Also, shifts with shift amount be zero are not handled properly.

This set cleans up these issues.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: complete ALU32 logic shift supports

The following ALU32 logic shift supports are missing:

  BPF_ALU | BPF_LSH | BPF_X
  BPF_ALU | BPF_RSH | BPF_X
  BPF_ALU | BPF_RSH | BPF_K

For BPF_RSH | BPF_K, it could be implemented using NFP direct shift
instruction. For the other BPF_X shifts, NFP indirect shifts sequences need
to be used.

Separate code-gen hook is assigned to each instruction to make the
implementation clear.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: correct the behavior for shifts by zero

Shifts by zero do nothing, and should be treated as nops.

Even though compiler is not supposed to generate such instructions and
manual written assembly is unlikely to have them, but they are legal
instructions and have defined behavior.

This patch correct existing shifts code-gen to make sure they do nothing
when shift amount is zero except when the instruction is ALU32 for which
high bits need to be cleared.

For shift amount bigger than type size, already, NFP JIT back-end errors
out for immediate shift and only low 5 bits will be taken into account for
indirect shift which is the same as x86.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: remove generated verifier/tests.h on 'make clean'

'make clean' is supposed to remove generated files.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'devlink-add-device-driver-information-API'

Jakub Kicinski says:

====================
devlink: add device (driver) information API

fw_version field in ethtool -i does not suit modern needs with 31
characters being quite limiting on more complex systems.  There is
also no distinction between the running and flashed versions of
the firmware.

Since the driver information pertains to the entire device, rather
than a particular netdev, it seems wise to move it do devlink, at
the same time fixing the aforementioned issues.

The new API allows exposing the device serial number and versions
of the components of the card - both hardware, firmware (running
and flashed).  Driver authors can choose descriptive identifiers
for the version fields.  A few version identifiers which seemed
relevant for most devices have been added to the global devlink
header.

Example:
$ devlink dev info pci/0000:05:00.0
pci/0000:05:00.0:
  driver nfp
  serial_number 16240145
  versions:
    fixed:
      board.id AMDA0099-0001
      board.rev 07
      board.vendor SMA
      board.model carbon
    running:
      fw.mgmt: 010156.010156.010156
      fw.cpld: 0x44
      fw.app: sriov-2.1.16
    stored:
      fw.mgmt: 010158.010158.010158
      fw.cpld: 0x44
      fw.app: sriov-2.1.20

Last patch also includes a compat code for ethtool.  If driver
reports no fw_version via the traditional ethtool API, ethtool
can call into devlink and try to cram as many versions as possible
into the 31 characters.

v4:
- use IS_REACHABLE instead of IS_ENABLED in last patch.

v3 (Jiri):
- rename various functions and attributes;
- break out the version helpers per-type;
- make the compat code parse a dump instead of special casing
   in each helper;
- move generic version defines to a separate patch.

v2:
- rebase.

this non-RFC, v3 some would say:
- add three more versions in the NFP patches;
- add last patch (ethool compat) - Andrew & Michal.

RFCv2:
- use one driver op;
- allow longer serial number;
- wrap the skb into an opaque request struct;
- add some common identifier into the devlink header.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: add compat for devlink info

If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.

v4:
- use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot).
v3 (Jiri):
- do a dump and then parse it instead of special handling;
- concatenate all versions (well, all that fit :)).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: devlink: report the running and flashed versions

Report versions of firmware components using the new NSP command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nsp: add support for versions command

Retrieve the FW versions with the new command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: devlink: report fixed versions

Report information about the hardware.

RFCv2:
- add defines for board IDs which are likely to be reusable for
other drivers (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: devlink: report driver name and serial number

Report the basic info through new devlink info API.

RFCv2:
- add driver name;
- align serial to core changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add generic info version names

Add defines and docs for generic info versions.

v3:
- add docs;
- separate patch (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add version reporting to devlink info API

ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version. Unfortunately, modern
hardware has more firmware components. There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load. Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.

Here are some examples from systems I could lay my hands on quickly:

tg3:  "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp:  "0.0.3.5 0.25 sriov-2.1.16 nic"

Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.

While at it break down the versions into three categories:
- fixed - this is the board/fixed component version, usually vendors
           report information like the board version in the PCI VPD,
           but it will benefit from naming and common API as well;
- running - this is the running firmware version;
- stored - this is firmware in the flash, after firmware update
            this value will reflect the flashed version, while the
            running version may only be updated after reboot.

v3:
- add per-type helpers instead of using the special argument (Jiri).
RFCv2:
- remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
   versions are mixed with other info attrs)l
- have the driver report versions from the same callback as
   other info.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add device information API

ethtool -i has served us well for a long time, but its showing
its limitations more and more. The device information should
also be reported per device not per-netdev.

Lay foundation for a simple devlink-based way of reading device
info. Add driver name and device serial number as initial pieces
of information exposed via this new API.

v3:
- rename helpers (Jiri);
- rename driver name attr (Jiri);
- remove double spacing in commit message (Jiri).
RFC v2:
- wrap the skb into an opaque structure (Jiri);
- allow the serial number of be any length (Jiri & Andrew);
- add driver name (Jonathan).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'selftests-Various-fixes'

Petr Machata says:

====================
selftests: Various fixes

This patch set contains various fixes whose common denominator is
improving quality of forwarding and mlxsw selftests.

Most of the fixes are improvements in determinism (such that timing and
latency don't impact the test performance). These were prompted by
regular runs of the test suite on a hardware emulator, the performance
of which is necessarily lower than that of the real device.

Patches #1 (from Ido), #2 and #3 make changes to ping limits.

Patches #4 and #5 add more sleep in places where things need more time
to finish.

Patches #6 and #7 fix two tests in the suite of mirror-to-gretap tests
where underlay involves a VLAN device over an 802.1q bridge.

Patches #8, #9 and #10 fix bugs in mirror-to-gretap test where underlay
involves a LAG device.

Patch #11 fixes a missed RET initialization in mirror-to-gretap flower
test.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_flower: Fix test result handling

The global variable RET needs to be initialized before each call to
log_test. This test case sets it once before running the tests, but then
calls log_tests for every individual test. Thus a failure in one of the
tests causes spurious failures in follow-up tests as well.

Fix by moving the initialization of RET from test_all() to
full_test_span_gre_dir_acl(), a function that implements the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_bridge_1q_lag: Ignore ARP

This test sets up mirroring such that it mirrors all overlay traffic.
That includes ARP, which causes occasional miscounts and spurious
failures. Ignore ARP explicitly to avoid these problems.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_bridge_1q_lag: Enable forwarding

This test relies on routing in the primary traffic path, but neglects to
enable forwarding. Do so.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_bridge_1q_lag: Flush neighbors

After one LAG slave is downed and another upped, it takes a while for
the neighbor on a bridge to time out and get renegotiated. The test does
prompt update of FDB entries by arpinging. But because the neighbor
still references another address, offloading is not possible, and some
packets may end up not being mirrored.

To force the neighbor renegotiation, simply flush the neighbor table at
the bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix roaming test

ARP or ND traffic can cause spurious migration of FDB back to $swp3.
Mirroring is then updated in accordance with the change, and mirrored
packets are seen at h3, causing a failure.

Detect the case of this spurious roaming, and retry the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix untagged test

The untagged egress test sets up mirroring to {,ip6}gretap such that the
underlay goes through a bridge. Then VLAN flags are manipulated to test
that the traffic leaves the bridge 802.1q-tagged or not, as appropriate.

However, when a neighbor expires at the time that the bridge VLAN is
configured as PVID and egress untagged, the following discovery process
can't finish, because the IP address on H3 is still at the VLAN-tagged
netdevice. This manifests by occasional failures where only several of
the 10 required packets get through.

Therefore, when reconfiguring the VLAN flags, move the IP address to the
appropriate device in the H3 VRF.

In addition to that, take this opportunity to embed an ASCII art diagram
to make the topology move obvious.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_lib: Wait for tardy mirrored packets

When running in an environment with poor performance (such as a
simulator), processing mirrored packets can take a while. Evaluating the
condition too soon leads to spurious "seen 9, expected 10" failures as
the last packet doesn't have enough time to get mirrored and the mirror
to arrive and bump the observed counters.

Wait for one ping interval before evaluating the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_gre_changes: Fix TTL test

When running in a simulator, the TTL change takes a while to settle and
during this time the performance of the packet processing is lowered.
The resulting instability leads to ping sending more packets as it
assumes some have been dropped. This then leads to regular spurious
failures as more packets than expected are observed.

Sleep a bit to give the system time to stabilize.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Update ping limits

The current ping intervals are too short for running mirroring tests in
simulator. This leads to ping sending a follow-up ping before the reply
arrives, thus sending more than the requested 10 ICMP requests. This
traffic is seen at the counters, and causes spurious failures.

Bump interval and timeout numbers 5x in mirroring tests to address the
spurious failures.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: mirror_lib: Update ping limits

The current ping intervals are too short for running mirroring tests in
simulator. This leads to ping sending a follow-up ping before the reply
arrives, thus sending more than the requested 10 ICMP requests. Those
are mirrored, and over a certain threshold the test case run is
considered a failure, because too much traffic is observed.

Bump interval and timeout numbers 5x in mirroring tests to address the
spurious failures.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Make ping timeout configurable

The current timeout (2 seconds) proved to be too low for some (emulated)
systems where we run the tests.

Make the timeout configurable and default to 5 seconds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipconfig: add carrier_timeout kernel parameter

commit 3fb72f1e6e61 ("ipconfig wait for carrier") added a
"wait for carrier" policy, with a fixed worst case maximum wait
of two minutes.

Now make the wait for carrier timeout configurable on the kernel
commandline and use the 120s as the default.

The timeout messages introduced with
commit 5e404cd65860 ("ipconfig: add informative timeout messages while
waiting for carrier") are done in a fixed interval of 20 seconds, just
like they were before (240/12).

Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fib: use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tulip: eeprom: use struct_size() in kmalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: smt: use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is
finding the size of a structure that has a zero-sized array at
the end, along with memory for some number of elements for that
array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: sched: use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is
finding the size of a structure that has a zero-sized array at
the end, along with memory for some number of elements for that
array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS

Currently we don't zerocopy if the crypto framework async bit is set.
However some crypto algorithms (such as x86 AESNI) support async,
but in the context of sendmsg, will never run asynchronously. Instead,
check for actual EINPROGRESS return code before assuming algorithm is
async.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>