git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge branch 'bpf-stackmap-build-id'

Song Liu says:

====================
This work follows up discussion at Plumbers'17 on improving addr->sym
resolution of user stack traces. The following links have more information
of the discussion:

http://www.linuxplumbersconf.org/2017/ocw/proposals/4764
https://lwn.net/Articles/734453/     Section "Stack traces and kprobes"

Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.

This patch tries to address this with a variation to stackmap. Instead
of storing addresses, the variation stores ELF file build_id + offset.
After profiling, a user space tool will look up these functions with
build_id (to find the binary or shared library) and the offset.

I also updated bcc/cc library for the stackmap (no python/lua support yet).
You can find the work at:

  https://github.com/liu-song-6/bcc/commits/bpf_get_stackid_v02

Changes v5 -> v6:

1. When kernel stack is added to stackmap with build_id, use fallback
   mechanism to store ip (status == BPF_STACK_BUILD_ID_IP).

Changes v4 -> v5:

1. Only allow build_id lookup in non-nmi context. Added comment and
   commit message to highlight this limitation.
2. Minor fix reported by kbuild test robot.

Changes v3 -> v4:

1. Add fallback when build_id lookup failed. In this case, status is set
   to BPF_STACK_BUILD_ID_IP, and ip of this entry is saved.
2. Handle cases where vma is only part of the file (vma->vm_pgoff != 0).
   Thanks to Teng for helping me identify this issue!
3. Address feedbacks for previous versions.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID

test_stacktrace_build_id() is added. It accesses tracepoint urandom_read
with "dd" and "urandom_read" and gathers stack traces. Then it reads the
stack traces from the stackmap.

urandom_read is a statically link binary that reads from /dev/urandom.
test_stacktrace_build_id() calls readelf to read build ID of urandom_read
and compares it with build ID from the stackmap.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: extend stackmap to save binary_build_id+offset instead of address

Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.

This patch tries to solve this problem with a variation of stackmap. This
variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
addresses, the variation stores ELF file build_id + offset.

Build ID is a 20-byte unique identifier for ELF files. The following
command shows the Build ID of /bin/bash:

  [user@]$ readelf -n /bin/bash
  ...
    Build ID: XXXXXXXXXX
  ...

With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
for each entry in the call trace, and translate it into the following
struct:

  struct bpf_stack_build_id_offset {
          __s32           status;
          unsigned char   build_id[BPF_BUILD_ID_SIZE];
          union {
                  __u64   offset;
                  __u64   ip;
          };
  };

The search of build_id is limited to the first page of the file, and this
page should be in page cache. Otherwise, we fallback to store ip for this
entry (ip field in struct bpf_stack_build_id_offset). This requires the
build_id to be stored in the first page. A quick survey of binary and
dynamic library files in a few different systems shows that almost all
binary and dynamic library files have build_id in the first page.

Build_id is only meaningful for user stack. If a kernel stack is added to
a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
lookup failed for some reason, it will also fallback to store ip.

User space can access struct bpf_stack_build_id_offset with bpf
syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
maintain mapping from build id to binary files. This mostly static
mapping is much easier to maintain than per process address maps.

Note: Stackmap with build_id only works in non-nmi context at this time.
This is because we need to take mm->mmap_sem for find_vma(). If this
changes, we would like to allow build_id lookup in nmi context.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: comment why dots in filenames under BPF virtual FS are not allowed

When pinning a file under the BPF virtual file system (traditionally
/sys/fs/bpf), using a dot in the name of the location to pin at is not
allowed. For example, trying to pin at "/sys/fs/bpf/foo.bar" will be
rejected with -EPERM.

This check was introduced at the same time as the BPF file system
itself, with commit b2197755b263 ("bpf: add support for persistent
maps/progs"). At this time, it was checked in a function called
"bpf_dname_reserved()", which made clear that using a dot was reserved
for future extensions.

This function disappeared and the check was moved elsewhere with commit
0c93b7d85d40 ("bpf: reject invalid names right in ->lookup()"), and the
meaning of the dot ban was lost.

The present commit simply adds a comment in the source to explain to the
reader that the usage of dots is reserved for future usage.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-tools-makefile-improvements'

Jiri Benc says:

====================
Currently, 'make bpf' in the tools/ directory does not provide the
standard quiet output except for bpftool (which is however listed
with a wrong directory). Worse, it does not respect the build output
directory.

The 'make bpf_install' does not work as one would expect, either.
It installs unconditionally to /usr/bin without respecting DESTDIR
and prefix.

This patchset improves that behavior.
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: silence make by not deleting intermediate file

Even in quiet mode, make finishes with

rm tools/bpf/bpf_exp.lex.c

That's because it considers the file to be intermediate. Silence that by
mentioning the lex.c file instead of the lex.o file; the dependency still
stays.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: respect quiet/verbose build

Default to quiet build, with V=1 enabling verbose build as is usual.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: call descend in Makefile

Use the descend macro to properly propagate $(subdir) to bpftool.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: make install should build first

Make the 'install' target depend on the 'all' target to build the binaries
first.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: consistent make bpf_install

Currently, make bpf_install in tools/ does not respect DESTDIR. Moreover, it
installs to /usr/bin/ unconditionally.

Let it respect DESTDIR and allow prefix to be specified. Also, to be more
consistent with bpftool and with the usual customs, default the prefix to
/usr/local instead of /usr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpf: respect output directory during build

Currently, the programs under tools/bpf (with the notable exception of
bpftool) do not respect the output directory (make O=dir). Fix that.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools: bpftool: silence 'missing initializer' warnings

When building bpf tool, gcc emits piles of warnings:

prog.c: In function ‘prog_fd_by_tag’:
prog.c:101:9: warning: missing initializer for field ‘type’ of ‘struct bpf_prog_info’ [-Wmissing-field-initializers]
  struct bpf_prog_info info = {};
         ^
In file included from /home/storage/jbenc/git/net-next/tools/lib/bpf/bpf.h:26:0,
                 from prog.c:47:
/home/storage/jbenc/git/net-next/tools/include/uapi/linux/bpf.h:925:8: note: ‘type’ declared here
  __u32 type;
        ^

As these warnings are not useful, switch them off.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-perf-sample-addr'

Teng Qin says:

====================
These patches add support that allows bpf programs attached to perf events to
read the address values recorded with the perf events. These values are
requested by specifying sample_type with PERF_SAMPLE_ADDR when calling
perf_event_open().

The main motivation for these changes is to support building memory or lock
access profiling and tracing tools. For example on Intel CPUs, the recorded
address values for supported memory or lock access perf events would be
the access or lock target addresses from PEBS buffer. Such information would
be very valuable for building tools that help understand memory access or
lock acquire pattern.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

samples/bpf: add example to test reading address

This commit adds additional test in the trace_event example, by
attaching the bpf program to MEM_UOPS_RETIRED.LOCK_LOADS event with
PERF_SAMPLE_ADDR requested, and print the lock address value read from
the bpf program to trace_pipe.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: add support to read sample address in bpf program

This commit adds new field "addr" to bpf_perf_event_data which could be
read and used by bpf programs attached to perf events. The value of the
field is copied from bpf_perf_event_data_kern.addr and contains the
address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
when calling perf_event_open.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE

Kirill found that recently added synchronize_rcu() call in
ip6mr_sk_done()
was slowing down netns dismantle and posted a patch to use it only if
the socket
was found.

I instead suggested to get rid of this call, and use instead
SOCK_RCU_FREE

We might later change IPv4 side to use the same technique and unify
both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
that could be replaced by SOCK_RCU_FREE.

Tested:
time for i in {1..1000}; do unshare -n /bin/false;done

Before : real 7m18.911s
After : real 10.187s

Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'RDS-zerocopy-code-enhancements'

Sowmini Varadhan says:

====================
RDS: zerocopy code enhancements

A couple of enhancements to the rds zerocop code
- patch 1 refactors rds_message_copy_from_user to pull the zcopy logic
  into its own function
- patch 2 drops the usage sk_buff to track MSG_ZEROCOPY cookies and
  uses a simple linked list (enhancement suggested by willemb during
  code review)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rds: use list structure to track information for zerocopy completion notification

Commit 401910db4cd4 ("rds: deliver zerocopy completion notification
with data") removes support fo r zerocopy completion notification
on the sk_error_queue, thus we no longer need to track the cookie
information in sk_buff structures.

This commit removes the struct sk_buff_head rs_zcookie_queue by
a simpler list that results in a smaller memory footprint as well
as more efficient memory_allocation time.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: refactor zcopy code into rds_message_zcopy_from_user

Move the large block of code predicated on zcopy from
rds_message_copy_from_user into a new function,
rds_message_zcopy_from_user()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: remove VLA usage

Remove VLA usage and change the 'len' argument to a u8 and use a 256
byte buffer on the stack. Notice that these lengths are limited by the
encoding field in the VPD structure, which is a u8 [1].

[1] https://marc.info/?l=linux-netdev&m=152044354814024&w=2

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Fix SO_ZEROCOPY switch case

Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the
ret values to be overwritten by the one set on the default case.

Fixes: 28190752c7092 ("sock: permit SO_ZEROCOPY on PF_RDS socket")
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvpp2-ucast-filter'

Maxime Chevallier says:

====================
net: mvpp2: Add Unicast filtering capabilities

This series adds unicast filtering support to the Marvell PPv2 controller.

This is implemented using the header parser cababilities of the PPv2,
which allows for generic packet filtering based on matching patterns in
the packet headers.

PPv2 controller only has 256 of these entries, and we need to share them
with other features, such as VLAN filtering.

For each interface, we have 5 entries dedicated to unicast filtering (the
controller's own address, and 4 other), and 21 to multicast filtering.

When this number is reached, the controller switches to unicast or
multicast promiscuous mode.

The first patch reworks the function that adds and removes addresses to the
filter. This is preparatory work to ease UC filter implementation.

The second patch adds the UC filtering feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: Add support for unicast filtering

Marvell PPv2 controller can be used to implement packet filtering based
on the destination MAC address. This is already used to implement
multicast filtering. This patch adds support for Unicast filtering.

Filtering is based on so-called "TCAM entries" to implement filtering.
Due to their limited number and the fact that these are also used for
other purposes, we reserve 80 entries for both unicast and multicast
filters. On top of the broadcast address, and each interface's own MAC
address, we reserve 25 entries per port, 4 for unicast filters, 21 for
multicast.

Whenever unicast or multicast range for one port is full, the filtering
is disabled and port goes into promiscuous mode for the given type of
addresses.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: Simplify MAC filtering function parameters

The mvpp2_prs_mac_da_accept function takes into parameter both the
struct representing the controller and the port id. This is meaningful
when we want to create TCAM entries for non-initialized ports, but in
this case we expect the port to be initialized before starting adding or
removing MAC addresses to the per-port filter.

This commit changes the function so that it takes struct mvpp2_port as
a parameter instead.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: fix flags passed to first drop rule in gact_drop_and_ok_test

Fix copy&paste error and pass proper flags.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: fix "ok" action test

Fix the "ok" action test so it checks that packet that is okayed does not
continue to be processed by other rules. Fix error message as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cdc_eem: clean up bind error path

Drop bogus call to usb_driver_release_interface() from an error path in
the usbnet bind() callback, which is called during interface probe. At
this point the interface is not bound and usb_driver_release_interface()
returns early.

Also remove the bogus call to clear the interface data, which is owned
by the usbnet driver and would not even have been set by the time bind()
is called.

Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: kalmia: clean up bind error path

Drop bogus call to usb_driver_release_interface() from an error path in
the usbnet bind() callback, which is called during interface probe. At
this point the interface is not bound and usb_driver_release_interface()
returns early.

Also remove the bogus call to clear the interface data, which is owned
by the usbnet driver and would not even have been set by the time bind()
is called.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2018-02-28-1' of git://git./linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-28-1 (IPSec-1)

This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.

We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.

In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.

Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.

Regards,
Aviad and Matan
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: Introduce first PMTU test

One single test implemented so far: test_pmtu_vti6_exception
checks that the PMTU of a route exception, caused by a tunnel
exceeding the link layer MTU, is affected by administrative
changes of the tunnel MTU. Creation of the route exception is
checked too.

Requested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: try to use high order pages for RX rings

RX rings can fit most of the time in a contiguous piece of memory,
so lets use kvzalloc_node/kvfree instead of vzalloc_node/vfree

Note that kvzalloc_node() automatically falls back to another node,
there is no need to do the fallback ourselves.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: fix boolreturn.cocci warnings

drivers/net/ethernet/cisco/enic/vnic_dev.c:1294:9-10: WARNING: return of 0/1 in function 'vnic_dev_capable_udp_rss' with return type bool

Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 48398b6e7065 ("enic: set UDP rss flag")
CC: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix boolreturn.cocci warnings

drivers/net/dsa/mv88e6xxx/serdes.c:66:9-10: WARNING: return of 0/1 in function 'mv88e6352_port_has_serdes' with return type bool

Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: eb755c3f6b7d ("net: dsa: mv88e6xxx: Add helper to determining if port has SERDES")
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-mux: slience probe defer error

If we fail to register the mdio bus due to probe defer, we should not
print an error message. Just be silent in this case.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: unpollute priv_flags space

the ipvlan device driver defines and uses 2 bits inside the priv_flags
net_device field. Such bits and the related helper are used only
inside the ipvlan device driver, and the core networking does not
need to be aware of them.

This change moves netif_is_ipvlan* helper in the ipvlan driver and
re-implement them looking for ipvlan specific symbols instead of
using priv_flags.

Overall this frees two bits inside priv_flags - and move the following
ones to avoid gaps - without any intended functional change.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-remove-phy_error-from-phy_disable_interrupts'

Heiner Kallweit says:

====================
net: phy: remove phy_error from phy_disable_interrupts

All callers of phy_disable_interrupts() call phy_error() in the error
case. Therefore we don't need to do this within the function too.
This change also allows us to use phy_disable_interrupts() in code
holding phydev->lock (because phy_error() takes this lock).
Make use of this in phy_stop().

v2:
- splitted into two separate patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: use phy_disable_interrupts in phy_stop

Now that phy_disable_interrupts() can't take lock phydev->lock any longer,
we can use it to simplify phy_stop().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: remove phy_error from phy_disable_interrupts

All callers of phy_disable_interrupts() call phy_error() in the error
case. Therefore we don't need to do this within the function too.
This change also allows us to use phy_disable_interrupts() in code
holding phydev->lock (because phy_error() can take this lock).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/net: fix in_netns.sh script

execute the subprocess in netns using 'ip netns exec'

Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: mvpp2_check_hw_buf_num() can be static

Fixes: effbf5f58d64 ("net: mvpp2: update the BM buffer free/destroy logic")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: ndisc: use true and false for boolean values

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: marvell: describe compatibility string

There are two compatibility strings for mv88e6xxx, but it isn't clear
from the documentation why only those two exist when the mv88e6xxx driver
supports more than the 6085 and 6190. Briefly describe how the compatible
property is used, and provide guidance on which to use.

The model list comes from looking at port_base_addr values (0x0 vs 0x10)
in drivers/net/dsa/mv88e6xxx/chip.c.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: bcast: use true and false for boolean values

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2018-03-05

This series contains updates to igb only.

Corinna Vinschen adds the support for trusted VFs into the igb driver.

Mika fixes an issue where PCIe device is physically unplugged can cause
a kernel crash. This issue is that netif_device_detach() is called in
these cases, which prevents netif_unregister() from bringing the device
down properly.

Christophe JAILLET fixes an issue with igb where HWTSTAMP_TX_ON was
being handled like a bit mask and not a value.

v2: dropped the e1000e fix from the series since I will be pushing it
through David Miller's net tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'lan743x-driver'

Bryan Whitehead says:

====================
lan743x: Add new lan743x driver

Add new lan743x driver.

The lan743x from Microchip Technologies Inc,
is a PCIe to Gigabit Ethernet Controller.

Updates for V4:
Patch 1/2 - Applied community suggestions
convert to using module_pci_driver

Updates for V3:
Patch 1/2 - Applied community suggestions
removed initialization tracking flags.
converted to 64 bit statistics.
converted tx clean up tasklet to napi.

Updates for V2:
Patch 1/2 - Applied community suggestions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

lan743x: Update MAINTAINERS to include lan743x driver

Update MAINTAINERS to include lan743x driver

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lan743x: Add main source files for new lan743x driver

Add main source files for new lan743x driver

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sctp-add-support-for-some-msg_control-options-from-RFC6458'

Xin Long says:

====================
sctp: add support for some msg_control options from RFC6458

This patchset is to add support for 3 msg_control options described
in RFC6458:

    5.3.7.  SCTP PR-SCTP Information Structure (SCTP_PRINFO)
    5.3.9.  SCTP Destination IPv4 Address Structure (SCTP_DSTADDRV4)
    5.3.10. SCTP Destination IPv6 Address Structure (SCTP_DSTADDRV6)

one send flag described in RFC6458:

    SCTP_SENDALL:  This flag, if set, will cause a one-to-many
    style socket to send the message to all associations that
    are currently established on this socket.  For the one-to-
    one style socket, this flag has no effect.

Note there is another msg_control option:

    5.3.8.  SCTP AUTH Information Structure (SCTP_AUTHINFO)

It's a little complicated, I will post it in another patchset after
this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: add support for snd flag SCTP_SENDALL process in sendmsg

This patch is to add support for snd flag SCTP_SENDALL process
in sendmsg, as described in section 5.3.4 of RFC6458.

With this flag, you can send the same data to all the asocs of
this sk once.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg

This patch is to add support for Destination IPv4/6 Address options
for sendmsg, as described in section 5.3.9/10 of RFC6458.

With this option, you can provide more than one destination addrs
to sendmsg when creating asoc, like sctp_connectx.

It's also a necessary send info for sctp_sendv.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: add support for PR-SCTP Information for sendmsg

This patch is to add support for PR-SCTP Information for sendmsg,
as described in section 5.3.7 of RFC6458.

With this option, you can specify pr_policy and pr_value for user
data in sendmsg.

It's also a necessary send info for sctp_sendv.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: remove erroneous comment

When addressing a review comment in a early version of the offending
patch a comment where left in which should have been removed. Remove the
comment to keep it consistent with the code.

Fixes: 75efa06f457bbed3 ("ravb: add support for changing MTU")
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Make account struct net to memcg

The patch adds SLAB_ACCOUNT to flags of net_cachep cache,
which enables accounting of struct net memory to memcg kmem.
Since number of net_namespaces may be significant, user
want to know, how much there were consumed, and control.

Note, that we do not account net_generic to the same memcg,
where net was accounted, moreover, we don't do this at all (*).
We do not want the situation, when single memcg memory deficit
prevents us to register new pernet_operations.

(*)Even despite there is !current process accounting already
available in linux-next. See kmalloc_memcg() there for the details.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Flow steering cmd interface should get the fte when deleting

Previously, deleting a flow steering entry only got the index.
Since the FPGA implementation of FTE's deletion might need to dig
inside the FTE itself, we would like to get the FTE's context.
Changing the interface to pass the FTE context.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

{net,IB}/mlx5: Add flow steering helpers

Add helper functions that check if a protocol is
part of a flow steering match criteria.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Embed mlx5_flow_act into fs_fte

fte objects contain the match value and action. Currently, extending
the actions require in adding them both to the API and fs_fte.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Add empty egress namespace to flow steering core

Currently, we don't support egress flow steering namespace in mlx5
flow steering core implementation. However, when we want to encrypt
a packet, we model it as a flow steering rule in the egress path.
To overcome this, we add an empty egress namespace to flow steering.
This namespace is initialized only when ipsec support exists.
In the future, this will grow to a full blown full steering
implementation, resembling the ingress path.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Add shim layer between fs and cmd

The shim layer allows each namespace to define possibly different
functionality for add/delete/update commands. The shim layer
introduced here, will be used to support flow steering with the FPGA.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

{net,IB}/mlx5: Add has_tag to mlx5_flow_act

The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag. HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments

Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.

This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>

net/mlx5: FPGA and IPSec initialization to be before flow steering

Some flow steering namespace initialization (i.e. egress namespace)
might depend on FPGA capabilities. Changing the initialization order
such that the FPGA will be initialized before flow steering.

Flow steering fs cmds initialization might depend on
IPSec capabilities. Changing the initialization order such
that the IPSec will be initialized before flow steering as well.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Removed not need synchronize_rcu

This is already done by xfrm layer between state_dev_del callback
to state_dev_free callback.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Fixed sleeping inside atomic context

We can't allocate with GFP_KERNEL inside spinlock.
Actually ida_simple doesn't require spinlock so remove it.

Fixes: 547eede070eb ("net/mlx5e: IPSec, Innova IPSec offload infrastructure")
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Wait for FPGA command responses with a timeout

Generally, FPGA IPSec commands must always complete.
We want to wait for one minute for them to complete gracefully also
when killing a process.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabled

IPSec init and cleanup functions also depends on linux/mlx5/driver.h.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

IB/mlx5: Removed not used parameters

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Merge git://git./linux/kernel/git/davem/net

All of the conflicts were cases of overlapping changes.

In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'please-pull-ia64_misc' of git://git./linux/kernel/git/aegl/linux

Pull ia64 cleanups from Tony Luck:

- More atomic cleanup from willy

- Fix a python script to work with version 3

- Some other small cleanups

* tag 'please-pull-ia64_misc' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities"
  ia64/err-inject: Use get_user_pages_fast()
  ia64: doc: tweak whitespace for 'console=' parameter
  ia64: Convert remaining atomic operations
  ia64: convert unwcheck.py to python3

ia64/err-inject: fix spelling mistake: "capapbilities" -> "capabilities"

Trivial fix to spelling mistake in debug message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ia64/err-inject: Use get_user_pages_fast()

At the point of sysfs callback, the call to gup is
done without mmap_sem (or any lock for that matter).
This is racy. As such, use the get_user_pages_fast()
alternative and safely avoid taking the lock, if possible.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ia64: doc: tweak whitespace for 'console=' parameter

CC: Tony Luck <tony.luck@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ia64: Convert remaining atomic operations

While we've only seen inlining problems with atomic_sub_return(),
the other atomic operations could have the same problem. Convert all
remaining operations to use the same solution as atomic_sub_return().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ia64: convert unwcheck.py to python3

Since my system use python3 as default, arch/ia64/scripts/unwcheck.py no
longer run.

This patch convert it to the python3 syntax.
I have ran it with python2/python3 while printing values of
start/end/rlen_sum which could be impacted by this change and I see no difference.

Fixes: 94a47083522e ("scripts: change scripts to use system python instead of env")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

Merge tag 'linux-kselftest-4.16-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"A fix for regression in memory-hotplug install script that prevents
the test from running on the target"

* tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: memory-hotplug: fix emit_tests regression

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Use an appropriate TSQ pacing shift in mac80211, from Toke
    Høiland-Jørgensen.

2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
    in ip6_route_me_harder, from Eric Dumazet.

3) Fix several shutdown races and similar other problems in l2tp, from
    James Chapman.

4) Handle missing XDP flush properly in tuntap, for real this time.
    From Jason Wang.

5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
    Borkmann.

6) Fix phy_resume() locking, from Andrew Lunn.

7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
    from Xin Long.

8) Revert F-RTO middle box workarounds, they only handle one dimension
    of the problem. From Yuchung Cheng.

9) Fix socket refcounting in RDS, from Ka-Cheong Poon.

10) Don't allow ppp unit registration to an unregistered channel, from
    Guillaume Nault.

11) Various hv_netvsc fixes from Stephen Hemminger.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
  hv_netvsc: propagate rx filters to VF
  hv_netvsc: filter multicast/broadcast
  hv_netvsc: defer queue selection to VF
  hv_netvsc: use napi_schedule_irqoff
  hv_netvsc: fix race in napi poll when rescheduling
  hv_netvsc: cancel subchannel setup before halting device
  hv_netvsc: fix error unwind handling if vmbus_open fails
  hv_netvsc: only wake transmit queue if link is up
  hv_netvsc: avoid retry on send during shutdown
  virtio-net: re enable XDP_REDIRECT for mergeable buffer
  ppp: prevent unregistered channels from connecting to PPP units
  tc-testing: skbmod: fix match value of ethertype
  mlxsw: spectrum_switchdev: Check success of FDB add operation
  net: make skb_gso_*_seglen functions private
  net: xfrm: use skb_gso_validate_network_len() to check gso sizes
  net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
  net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
  rds: Incorrect reference counting in TCP socket creation
  net: ethtool: don't ignore return from driver get_fecparam method
  vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
  ...

Merge branch 'mvpp2-jumbo-frames-support'

Antoine Tenart says:

====================
net: mvpp2: jumbo frames support

This series enable jumbo frames support in the Marvell PPv2 driver. The
first 2 patches rework the buffer management, then two patches prepare for
the final patch which adds the jumbo frames support into the driver.

This is based on top of net-next, and was tested on a mcbin.

Thanks!
Antoine

Since v1:
  - Improved the Tx FIFO initialization comment.
  - Improved the pool sanity check in mvpp2_bm_pool_use().
  - Fixed pool related comments.
  - Cosmetic fixes (used BIT() whenever possible).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: jumbo frames support

This patch adds the support for jumbo frames in the Marvell PPv2 driver.
A third buffer pool is added with 10KB buffers, which is used if the MTU
is higher than 1518B for packets larger than 1518B. Please note only the
port 0 supports hardware checksum offload due to the Tx FIFO size
limitation.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: enable UDP/TCP checksum over IPv6

This patch adds the NETIF_F_IPV6_CSUM to the driver's features to enable
UDP/TCP checksum over IPv6. No extra configuration of the engine is
needed on top of the IPv4 counterpart, which already is in the features
list (NETIF_F_IP_CSUM).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: use a data size of 10kB for Tx FIFO on port 0

This patch sets the Tx FIFO data size on port 0 to 10kB. This prepares
the PPv2 driver for the Jumbo frame support addition as the hardware
will need big enough Tx FIFO buffers when dealing with frames going
through an interface with an MTU of 9000.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message, small reworks.]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: update the BM buffer free/destroy logic

The buffer free routine is updated to release only given a number of
buffers, and the destroy routine now checks the actual number of buffers
in the (BPPI and BPPE) HW counters before draining the pools. This
change helps getting jumbo frames support.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: use the same buffer pool for all ports

This patch configures the buffer manager long pool for all ports part of
the same CP. Long pool separation between ports is redundant since there
are no performance improvement when different pools are used.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: dst: Add kernel-doc for 'net' parameter

This fixes the following kernel-doc warning:

./include/net/dst.h:366: warning: Function parameter or member 'net' not described in 'skb_tunnel_rx'

Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency

The other dst_cache_{get,set}_ip{4,6} functions, and the doc comment for
dst_cache_set_ip6 use 'saddr' for their source address parameter. Rename
the parameter to increase consistency.

This fixes the following kernel-doc warnings:

./include/net/dst_cache.h:58: warning: Function parameter or member 'addr' not described in 'dst_cache_set_ip6'
./include/net/dst_cache.h:58: warning: Excess function parameter 'saddr' description in 'dst_cache_set_ip6'

Fixes: 911362c70df5 ("net: add dst_cache support")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: dst_cache: Fix a typo in a comment

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Fix a test with HWTSTAMP_TX_ON

'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask.
The modified code should behave the same, because HWTSTAMP_TX_ON is 1
and no other possible values of 'tx_type' would match the test.
However, this is more future-proof, should other values be allowed one day.

See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h'

This fixes a warning reported by smatch:
igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&'

Fixes: 26bd4e2db06be ("igb: protect TX timestamping from API misuse")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Do not call netif_device_detach() when PCIe link goes missing

When the driver notices that PCIe link is gone by reading 0xffffffff
from a register it clears hw->hw_addr and then calls netif_device_detach().
This happens when the PCIe device is physically unplugged for example
the user disconnected the Thunderbolt cable.

However, netif_device_detach() prevents netif_unregister() from bringing
the device down properly including tearing down MSI-X vectors. This
triggers following crash during the driver removal:

  igb 0000:0b:00.0 enp11s0f0: PCIe link lost, device now detached
  ------------[ cut here ]------------
  kernel BUG at drivers/pci/msi.c:352!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  ...
  Call Trace:
   pci_disable_msix+0xc9/0xf0
   igb_reset_interrupt_capability+0x58/0x60 [igb]
   igb_remove+0x90/0x100 [igb]
   pci_device_remove+0x31/0xa0
   device_release_driver_internal+0x152/0x210
   pci_stop_bus_device+0x78/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_bus_device+0x26/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_and_remove_bus_device+0x9/0x20
   trim_stale_devices+0xee/0x130
   ? _raw_spin_unlock_irqrestore+0xf/0x30
   trim_stale_devices+0x8f/0x130
   ? _raw_spin_unlock_irqrestore+0xf/0x30
   trim_stale_devices+0xa1/0x130
   ? get_slot_status+0x8b/0xc0
   acpiphp_check_bridge.part.7+0xf9/0x140
   acpiphp_hotplug_notify+0x170/0x1f0
   ...

To prevent the crash do not call netif_device_detach() in igb_rd32().
This should be fine because hw->hw_addr is set to NULL preventing future
hardware access of the now missing device.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181
Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com>
Reported-by: Nikolay Bogoychev <nheart@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: add VF trust infrastructure

* Add a per-VF value to know if a VF is trusted, by default don't
  trust VFs.

* Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add
  trust status to ndo_get_vf_config output.

* Allow a trusted VF to change MAC and MAC filters even if MAC
  has been administratively set.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'convert-pernet_operations-part4'

Kirill Tkhai says:

====================
Converting pernet_operations (part #4)

this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. The patches touch mostly netfilter,
also there are small number of changes in other places.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert proto_gre_net_ops

These pernet_operations register and unregister sysctl.
nf_conntrack_l4proto_gre4->init_net is simple memory
initializer. Also, exit method removes gre keymap_list,
which is per-net. This looks safe to be executed
in parallel with other pernet_operations.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert ctnetlink_net_ops

These pernet_operations register and unregister
two conntrack notifiers, and they seem to be safe
to be executed in parallel.

General/not related to async pernet_operations JFI:
ctnetlink_net_exit_batch() actions are grouped in batch,
and this could look like there is synchronize_rcu()
is forgotten. But there is synchronize_rcu() on module
exit patch (in ctnetlink_exit()), so this batch may
be reworked as simple .exit method.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert nf_conntrack_net_ops

These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert ip_set_net_ops

These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:

#ifdef CONFIG_IP_SET
extern func(void);
#else
static inline func(void);
#endif

So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).

ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert fou_net_ops

These pernet_operations initialize and destroy
pernet net_generic(net, fou_net_id) list.
The rest of net_generic(net, fou_net_id) accesses
may happen after netlink message, and in-tree
pernet_operations do not send FOU_GENL_NAME messages.
So, these pernet_operations are safe to be marked
as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert dccp_v6_ops

These pernet_operations looks similar to dccp_v4_ops,
and they are also safe to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert dccp_v4_ops

These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert cangw_pernet_ops

These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert caif_net_ops

Init method just allocates memory for new cfg, and
assigns net_generic(net, caif_net_id). Despite there is
synchronize_rcu() on error path in cfcnfg_create(),
in real this function does not use global lists,
so it looks like this synchronize_rcu() is some legacy
inheritance. Exit method removes caif devices under
rtnl_lock().

There could be a problem, if someone from foreign net
pernet_operations dereference caif_net_id of this net.
It's dereferenced in get_cfcnfg() and caif_device_list().

get_cfcnfg() is used from netdevice notifiers, where
they are called under rtnl_lock(). The notifiers can't
be called from foreign nets pernet_operations. Also,
it's used from caif_disconnect_client() and from
caif_connect_client(). The both of the functions work
with caif socket, and there is the only possibility
to have a socket, when the net is dead. This may happen
only of the socket was created as kern using sk_alloc().
Grep by PF_CAIF shows we do not create kern caif sockets,
so get_cfcnfg() is safe.

caif_device_list() is used in netdevice notifiers and exit
method under rtnl lock. Also, from caif_get() used in
the netdev notifiers and in caif_flow_cb(). The last item
is skb destructor. Since there are no kernel caif sockets
nobody can send net a packet in parallel with init/exit,
so this is also safe.

So, these pernet_operations are safe to be async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert arp_tables_net_ops and ip6_tables_net_ops

These pernet_operations call xt_proto_init() and xt_proto_fini(),
which just register and unregister /proc entries.
They are safe to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert log pernet_operations

These pernet_operations use nf_log_set() and nf_log_unset()
in their methods:

nf_log_bridge_net_ops
nf_log_arp_net_ops
nf_log_ipv4_net_ops
nf_log_ipv6_net_ops
nf_log_netdev_net_ops

Nobody can send such a packet to a net before it's became
registered, nobody can send a packet after all netdevices
are unregistered. So, these pernet_operations are able
to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Convert broute_net_ops, frame_filter_net_ops and frame_nat_net_ops

These pernet_operations use ebt_register_table() and
ebt_unregister_table() to act on the tables, which
are used as argument in ebt_do_table(), called from
ebtables hooks.

Since there are no net-related bridge packets in-flight,
when the init and exit methods are called, these
pernet_operations are safe to be executed in parallel
with any other.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>