openwrt/staging/blogic.git
6 years agostmmac: fix DMA channel hang in half-duplex mode
Bhadram Varka [Sun, 17 Jun 2018 14:32:05 +0000 (20:02 +0530)]
stmmac: fix DMA channel hang in half-duplex mode

HW does not support Half-duplex mode in multi-queue
scenario. Fix it by not advertising the Half-Duplex
mode if multi-queue enabled.

Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: socfpga: add additional ocp reset line for Stratix10
Dinh Nguyen [Tue, 19 Jun 2018 15:35:38 +0000 (10:35 -0500)]
net: stmmac: socfpga: add additional ocp reset line for Stratix10

The Stratix10 platform has an additional reset line, OCP(Open Core Protocol),
that also needs to get deasserted for the stmmac ethernet controller to work.
Thus we need to update the Kconfig to include ARCH_STRATIX10 in order to build
dwmac-socfpga.

Also, remove the redundant check for the reset controller pointer. The
reset driver already checks for the pointer and returns 0 if the pointer
is NULL.

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sungem: fix rx checksum support
Eric Dumazet [Wed, 20 Jun 2018 02:18:50 +0000 (19:18 -0700)]
net: sungem: fix rx checksum support

After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
are friends"), sungem owners reported the infamous "eth0: hw csum failure"
message.

CHECKSUM_COMPLETE has in fact never worked for this driver, but this
was masked by the fact that upper stacks had to strip the FCS, and
therefore skb->ip_summed was set back to CHECKSUM_NONE before
my recent change.

Driver configures a number of bytes to skip when the chip computes
the checksum, and for some reason only half of the Ethernet header
was skipped.

Then a second problem is that we should strip the FCS by default,
unless the driver is updated to eventually support NETIF_F_RXFCS in
the future.

Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
or not, so that the admin can turn off rx checksum if wanted.

Many thanks to Andreas Schwab and Mathieu Malaterre for their
help in debugging this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpfilter: ignore binary files
Matteo Croce [Tue, 19 Jun 2018 15:21:36 +0000 (17:21 +0200)]
bpfilter: ignore binary files

net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
enabled, add it to .gitignore to avoid committing it.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpfilter: fix build error
Matteo Croce [Tue, 19 Jun 2018 15:16:20 +0000 (17:16 +0200)]
bpfilter: fix build error

bpfilter Makefile assumes that the system locale is en_US, and the
parsing of objdump output fails.
Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
only 2 processes instead of 7.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/usb/drivers: Remove useless hrtimer_active check
Daniel Lezcano [Tue, 19 Jun 2018 14:14:30 +0000 (16:14 +0200)]
net/usb/drivers: Remove useless hrtimer_active check

The code does:

 if (hrtimer_active(&t))
    hrtimer_cancel(&t);

However, hrtimer_cancel() checks if the timer is active, so the
test above is pointless.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_ife: preserve the action control in case of error
Davide Caratti [Tue, 19 Jun 2018 13:45:50 +0000 (15:45 +0200)]
net/sched: act_ife: preserve the action control in case of error

in the following script

 # tc actions add action ife encode allow prio pass index 42
 # tc actions replace action ife encode allow tcindex drop index 42

the action control should remain equal to 'pass', if the kernel failed
to replace the TC action. Pospone the assignment of the action control,
to ensure it is not overwritten in the error path of tcf_ife_init().

Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_ife: fix recursive lock and idr leak
Davide Caratti [Tue, 19 Jun 2018 13:39:46 +0000 (15:39 +0200)]
net/sched: act_ife: fix recursive lock and idr leak

a recursive lock warning [1] can be observed with the following script,

 # $TC actions add action ife encode allow prio pass index 42
 IFE type 0xED3E
 # $TC actions replace action ife encode allow tcindex pass index 42

in case the kernel was unable to run the last command (e.g. because of
the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
the kernel can leak idr in the error path of tcf_ife_init(), because
tcf_idr_release() is not called after successful idr reservation:

 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel
 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
 IFE type 0xFEFE
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

Since tcfa_lock is already taken when the action is being edited, a call
to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
again. On the other hand, tcf_idr_release() needs to be called in the
error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
Fix both problems in tcf_ife_init().
Since the cleanup() routine can now be called when ife->params is NULL,
also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).

 [1]
 ============================================
 WARNING: possible recursive locking detected
 4.17.0-rc4.kasan+ #417 Tainted: G            E
 --------------------------------------------
 tc/3932 is trying to acquire lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]

 but task is already holding lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&p->tcfa_lock)->rlock);
   lock(&(&p->tcfa_lock)->rlock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by tc/3932:
  #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
  #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 stack backtrace:
 CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Call Trace:
  dump_stack+0x9a/0xeb
  __lock_acquire+0xf43/0x34a0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? __mutex_lock+0x62f/0x1240
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_acquire+0x10b/0x330
  lock_acquire+0x10b/0x330
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  _raw_spin_lock_bh+0x38/0x70
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  tcf_ife_cleanup+0x19/0x80 [act_ife]
  __tcf_idr_release+0xff/0x350
  tcf_ife_init+0xdde/0x13c0 [act_ife]
  ? ife_exit_net+0x290/0x290 [act_ife]
  ? __lock_is_held+0xb4/0x140
  tcf_action_init_1+0x67b/0xad0
  ? tcf_action_dump_old+0xa0/0xa0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? memset+0x1f/0x40
  tcf_action_init+0x30f/0x590
  ? tcf_action_init_1+0xad0/0xad0
  ? memset+0x1f/0x40
  tc_ctl_action+0x48e/0x5e0
  ? mutex_lock_io_nested+0x1160/0x1160
  ? tca_action_gd+0x990/0x990
  ? sched_clock+0x5/0x10
  ? find_held_lock+0x39/0x1d0
  rtnetlink_rcv_msg+0x4da/0x990
  ? validate_linkmsg+0x680/0x680
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  netlink_rcv_skb+0x127/0x350
  ? validate_linkmsg+0x680/0x680
  ? netlink_ack+0x970/0x970
  ? __kmalloc_node_track_caller+0x304/0x3a0
  netlink_unicast+0x40f/0x5d0
  ? netlink_attachskb+0x580/0x580
  ? _copy_from_iter_full+0x187/0x760
  ? import_iovec+0x90/0x390
  netlink_sendmsg+0x67f/0xb50
  ? netlink_unicast+0x5d0/0x5d0
  ? copy_msghdr_from_user+0x206/0x340
  ? netlink_unicast+0x5d0/0x5d0
  sock_sendmsg+0xb3/0xf0
  ___sys_sendmsg+0x60a/0x8b0
  ? copy_msghdr_from_user+0x340/0x340
  ? lock_downgrade+0x5e0/0x5e0
  ? tty_write_lock+0x18/0x50
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_downgrade+0x5e0/0x5e0
  ? lock_acquire+0x10b/0x330
  ? __audit_syscall_entry+0x316/0x690
  ? current_kernel_time64+0x6b/0xd0
  ? __fget_light+0x55/0x1f0
  ? __sys_sendmsg+0xd2/0x170
  __sys_sendmsg+0xd2/0x170
  ? __ia32_sys_shutdown+0x70/0x70
  ? syscall_trace_enter+0x57a/0xd60
  ? rcu_read_lock_sched_held+0xdc/0x110
  ? __bpf_trace_sys_enter+0x10/0x10
  ? do_syscall_64+0x22/0x480
  do_syscall_64+0xa5/0x480
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fd646988ba0
 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100

Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: fix suspend/resume in davinci_emac
Bartosz Golaszewski [Tue, 19 Jun 2018 12:44:00 +0000 (14:44 +0200)]
net: ethernet: fix suspend/resume in davinci_emac

This patch reverts commit 3243ff2a05ec ("net: ethernet: davinci_emac:
Deduplicate bus_find_device() by name matching") and adds a comment
which should stop anyone from reintroducing the same "fix" in the future.

We can't use bus_find_device_by_name() here because the device name is
not guaranteed to be 'davinci_mdio'. On some systems it can be
'davinci_mdio.0' so we need to use strncmp() against the first part of
the string to correctly match it.

Fixes: 3243ff2a05ec ("net: ethernet: davinci_emac: Deduplicate bus_find_device() by name matching")
Cc: stable@vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Acked-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: propagate dev_get_valid_name return code
Li RongQing [Tue, 19 Jun 2018 09:23:17 +0000 (17:23 +0800)]
net: propagate dev_get_valid_name return code

if dev_get_valid_name failed, propagate its return code

and remove the setting err to ENODEV, it will be set to
0 again before dev_change_net_namespace exits.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoenic: do not overwrite error code
Govindarajulu Varadarajan [Mon, 18 Jun 2018 17:01:05 +0000 (10:01 -0700)]
enic: do not overwrite error code

In failure path, we overwrite err to what vnic_rq_disable() returns. In
case it returns 0, enic_open() returns success in case of error.

Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Fixes: e8588e268509 ("enic: enable rq before updating rq descriptors")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tcp: Fix socket lookups with SO_BINDTODEVICE
David Ahern [Mon, 18 Jun 2018 19:30:37 +0000 (12:30 -0700)]
net/tcp: Fix socket lookups with SO_BINDTODEVICE

Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
need to fail if dev_match is not true. Currently, a packet to a given port
can match a socket bound to device when it should not. In the VRF case,
this causes the lookup to hit a VRF socket and not a global socket
resulting in a response trying to go through the VRF when it should not.

Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
Reported-by: Lou Berger <lberger@labn.net>
Diagnosed-by: Renato Westphal <renato@opensourcerouting.org>
Tested-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptp: replace getnstimeofday64() with ktime_get_real_ts64()
Arnd Bergmann [Mon, 18 Jun 2018 14:20:39 +0000 (16:20 +0200)]
ptp: replace getnstimeofday64() with ktime_get_real_ts64()

getnstimeofday64() is deprecated and getting replaced throughout
the kernel with ktime_get_*() based helpers for a more consistent
interface.

The two functions do the exact same thing, so this is just
a cosmetic change.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: respect rcu grace period before freeing fib6_info
Eric Dumazet [Mon, 18 Jun 2018 12:24:31 +0000 (05:24 -0700)]
net/ipv6: respect rcu grace period before freeing fib6_info

syzbot reported use after free that is caused by fib6_info being
freed without a proper RCU grace period.

CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __read_once_size include/linux/compiler.h:188 [inline]
 find_rr_leaf net/ipv6/route.c:705 [inline]
 rt6_select net/ipv6/route.c:761 [inline]
 fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
 ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
 fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
 ip6_route_output include/net/ip6_route.h:82 [inline]
 icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
 icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
 ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
 dst_link_failure include/net/dst.h:427 [inline]
 ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
 neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
 neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:404
 exiting_irq arch/x86/include/asm/apic.h:527 [inline]
 smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 </IRQ>
RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
 strlen include/linux/string.h:267 [inline]
 getname_kernel+0x24/0x370 fs/namei.c:218
 open_exec+0x17/0x70 fs/exec.c:882
 load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
 search_binary_handler+0x17d/0x570 fs/exec.c:1653
 exec_binprm fs/exec.c:1695 [inline]
 __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
 do_execveat_common fs/exec.c:1866 [inline]
 do_execve fs/exec.c:1883 [inline]
 __do_sys_execve fs/exec.c:1964 [inline]
 __se_sys_execve fs/exec.c:1959 [inline]
 __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1576a46207
Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005

Allocated by task 12188:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:513 [inline]
 kzalloc include/linux/slab.h:706 [inline]
 fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
 ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
 ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
 ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
 sock_ioctl+0x30d/0x680 net/socket.c:1097
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1402:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
 fib6_info_release include/net/ip6_fib.h:286 [inline]
 __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
 ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
 ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
 sock_ioctl+0x30d/0x680 net/socket.c:1097
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801b5df2580
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 8 bytes inside of
 256-byte region [ffff8801b5df2580ffff8801b5df2680)
The buggy address belongs to the page:
page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb

Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: net_failover: fix typo in net_failover_slave_register()
Liran Alon [Mon, 18 Jun 2018 12:04:05 +0000 (15:04 +0300)]
net: net_failover: fix typo in net_failover_slave_register()

Sync both unicast and multicast lists instead of unicast twice.

Fixes: cfc80d9a116 ("net: Introduce net_failover driver")
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipvlan: use ETH_MAX_MTU as max mtu
Xin Long [Mon, 18 Jun 2018 08:15:57 +0000 (16:15 +0800)]
ipvlan: use ETH_MAX_MTU as max mtu

Similar to the fixes on team and bonding, this restores the ability
to set an ipvlan device's mtu to anything higher than 1500.

Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hamradio: use eth_broadcast_addr
Stefan Agner [Sun, 17 Jun 2018 21:40:53 +0000 (23:40 +0200)]
net: hamradio: use eth_broadcast_addr

The array bpq_eth_addr is only used to get the size of an
address, whereas the bcast_addr is used to set the broadcast
address. This leads to a warning when using clang:
drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
      needed and will not be emitted [-Wunneeded-internal-declaration]
static char bpq_eth_addr[6];
            ^

Remove both variables and use the common eth_broadcast_addr
to set the broadcast address.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoenic: initialize enic->rfs_h.lock in enic_probe
Govindarajulu Varadarajan [Tue, 19 Jun 2018 15:15:24 +0000 (08:15 -0700)]
enic: initialize enic->rfs_h.lock in enic_probe

lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
can be called when interface is down.

Move enic_rfs_flw_tbl_init to enic_probe.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
register_lock_class+0x550/0x560
? __handle_mm_fault+0xa8b/0x1100
__lock_acquire+0x81/0x670
lock_acquire+0xb9/0x1e0
?  enic_get_rxnfc+0x139/0x2b0 [enic]
_raw_spin_lock_bh+0x38/0x80
? enic_get_rxnfc+0x139/0x2b0 [enic]
enic_get_rxnfc+0x139/0x2b0 [enic]
ethtool_get_rxnfc+0x8d/0x1c0
dev_ethtool+0x16c8/0x2400
? __mutex_lock+0x64d/0xa00
? dev_load+0x6a/0x150
dev_ioctl+0x253/0x4b0
sock_do_ioctl+0x9a/0x130
sock_ioctl+0x1af/0x350
do_vfs_ioctl+0x8e/0x670
? syscall_trace_enter+0x1e2/0x380
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5a/0x170
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'NCSI-silence-warning-messages'
David S. Miller [Tue, 19 Jun 2018 22:26:59 +0000 (07:26 +0900)]
Merge branch 'NCSI-silence-warning-messages'

Joel Stanley says:

====================
Slience NCSI logging

v2:
  Fix indent issue and commit message based on Joe's feedback
  Add Sam's acks

Here are three changes to silence unnecessary warnings in the ncsi code.

The final patch adds Sam as the maintainer for NCSI.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: Add Sam as the maintainer for NCSI
Joel Stanley [Tue, 19 Jun 2018 05:38:34 +0000 (15:08 +0930)]
MAINTAINERS: Add Sam as the maintainer for NCSI

Sam has been handing the maintenance of NCSI for a number release cycles
now.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ncsi: Use netdev_dbg for debug messages
Joel Stanley [Tue, 19 Jun 2018 05:38:33 +0000 (15:08 +0930)]
net/ncsi: Use netdev_dbg for debug messages

This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to
netdev_dbg.

As Joe explains:

> netdev_dbg is not included in object code unless
> DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set.
> And then, it is not emitted into the log unless
> DEBUG is set or this specific netdev_dbg is enabled
> via the dynamic debug control file.

Which is what we're after in this case.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ncsi: Drop no more channels message
Joel Stanley [Tue, 19 Jun 2018 05:38:32 +0000 (15:08 +0930)]
net/ncsi: Drop no more channels message

This does not provide useful information. As the ncsi maintainer said:

 > either we get a channel or broadcom has gone out to lunch

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ncsi: Silence debug messages
Joel Stanley [Tue, 19 Jun 2018 05:38:31 +0000 (15:08 +0930)]
net/ncsi: Silence debug messages

In normal operation we see this series of messages as the host drives
the network device:

 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
 ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
 ftgmac100 1e660000.ethernet eth0: NCSI interface down
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI interface up
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
 ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
 ftgmac100 1e660000.ethernet eth0: NCSI interface down
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI interface up

This makes all of these messages netdev_dbg. They are still useful to
debug eg. misbehaving network device firmware, but we do not need them
filling up the kernel logs in normal operation.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
Daniel Borkmann [Tue, 19 Jun 2018 21:33:54 +0000 (14:33 -0700)]
bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize

Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
is clearly wrong since I40E_SKB_PAD already points to the offset where
the original xdp->data was sitting since xdp->data_hard_start is defined
as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
when build skb is used.

However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
this seems broken since bpf_xdp_adjust_head() helper could have been used
to alter headroom and enlarge / shrink the frame and with that the assumption
that the xdp->data remains unchanged does not hold and would push a bogus
packet to upper stack.

ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
skb_reserve() and truesize calculation.

Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Reported-by: Keith Busch <keith.busch@linux.intel.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Tested-by: Keith Busch <keith.busch@linux.intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'qed-fixes'
David S. Miller [Tue, 19 Jun 2018 22:15:34 +0000 (07:15 +0900)]
Merge branch 'qed-fixes'

Sudarsana Reddy Kalluru says:

====================
qed*: Fix series.

The patch series fixes few issues in the qed/qede drivers.
Please consider applying this series to "net".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Do not advertise DCBX_LLD_MANAGED capability.
Sudarsana Reddy Kalluru [Tue, 19 Jun 2018 04:58:02 +0000 (21:58 -0700)]
qed: Do not advertise DCBX_LLD_MANAGED capability.

Do not advertise DCBX_LLD_MANAGED capability i.e., do not allow
external agent to manage the dcbx/lldp negotiation. MFW acts as lldp agent
for qed* devices, and no other lldp agent is allowed to coexist with mfw.

Also updated a debug print, to not to display the redundant info.

Fixes: a1d8d8a51 ("qed: Add dcbnl support.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Add sanity check for SIMD fastpath handler.
Sudarsana Reddy Kalluru [Tue, 19 Jun 2018 04:58:01 +0000 (21:58 -0700)]
qed: Add sanity check for SIMD fastpath handler.

Avoid calling a SIMD fastpath handler if it is NULL. The check is needed
to handle an unlikely scenario where unsolicited interrupt is destined to
a PF in INTa mode.

Fixes: fe56b9e6a ("qed: Add module with basic common support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Fix possible memory leak in Rx error path handling.
Sudarsana Reddy Kalluru [Tue, 19 Jun 2018 04:58:00 +0000 (21:58 -0700)]
qed: Fix possible memory leak in Rx error path handling.

Memory for packet buffers need to be freed in the error paths as there is
no consumer (e.g., upper layer) for such packets and that memory will never
get freed.
The issue was uncovered when port was attacked with flood of isatap
packets, these are multicast packets hence were directed at all the PFs.
For foce PF, this meant they were routed to the ll2 module which in turn
drops such packets.

Fixes: 0a7fb11c ("qed: Add Light L2 support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: blackhole: tell upper qdisc about dropped packets
Konstantin Khlebnikov [Fri, 15 Jun 2018 10:27:31 +0000 (13:27 +0300)]
net_sched: blackhole: tell upper qdisc about dropped packets

When blackhole is used on top of classful qdisc like hfsc it breaks
qlen and backlog counters because packets are disappear without notice.

In HFSC non-zero qlen while all classes are inactive triggers warning:
WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
and schedules watchdog work endlessly.

This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
this flag tells upper layer: this packet is gone and isn't queued.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobluetooth: hci_nokia: Don't include linux/unaligned/le_struct.h directly.
David S. Miller [Sat, 16 Jun 2018 23:38:55 +0000 (08:38 +0900)]
bluetooth: hci_nokia: Don't include linux/unaligned/le_struct.h directly.

This breaks the build as this header is not meant to be used in this
way.

./include/linux/unaligned/access_ok.h:8:28: error: redefinition of â€˜get_unaligned_le16’
 static __always_inline u16 get_unaligned_le16(const void *p)
                            ^~~~~~~~~~~~~~~~~~
In file included from drivers/bluetooth/hci_nokia.c:32:
./include/linux/unaligned/le_struct.h:7:19: note: previous definition of â€˜get_unaligned_le16’ was here
 static inline u16 get_unaligned_le16(const void *p)

Use asm/unaligned.h instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoatm: Preserve value of skb->truesize when accounting to vcc
David Woodhouse [Sat, 16 Jun 2018 10:55:44 +0000 (11:55 +0100)]
atm: Preserve value of skb->truesize when accounting to vcc

ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
which they are to be sent. But it doesn't take ownership of those
packets from the sock (if any) which originally owned them. They should
remain owned by their actual sender until they've left the box.

There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs, precisely to avoid messing up sk_wmem_alloc
accounting. Ideally that hack would cover the ATM use case too, but it
doesn't â€” skbs which aren't owned by any sock, for example PPP control
frames, still get their truesize adjusted when the low-level ATM driver
adds headroom.

This has always been an issue, it seems. The truesize of a packet
increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
for normal traffic, only for control frames. So I think we just got away
with it, and we probably needed to send 2GiB of LCP echo frames before
the misaccounting would ever have caused a problem and caused
atm_may_send() to start refusing packets.

Commit 14afee4b609 ("net: convert sock.sk_wmem_alloc from atomic_t to
refcount_t") did exactly what it was intended to do, and turned this
mostly-theoretical problem into a real one, causing PPPoATM to fail
immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
starts refusing to allow new packets.

The least intrusive solution to this problem is to stash the value of
skb->truesize that was accounted to the VCC, in a new member of the
ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
value instead of the then-current value of skb->truesize.

Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sat, 16 Jun 2018 22:54:24 +0000 (07:54 +0900)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-06-16

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a panic in devmap handling in generic XDP where return type
   of __devmap_lookup_elem() got changed recently but generic XDP
   code missed the related update, from Toshiaki.

2) Fix a freeze when BPF progs are loaded that include BPF to BPF
   calls when JIT is enabled where we would later bail out via error
   path w/o dropping kallsyms, and another one to silence syzkaller
   splats from locking prog read-only, from Daniel.

3) Fix a bug in test_offloads.py BPF selftest which must not assume
   that the underlying system have no BPF progs loaded prior to test,
   and one in bpftool to fix accuracy of program load time, from Jakub.

4) Fix a bug in bpftool's probe for availability of the bpf(2)
   BPF_TASK_FD_QUERY subcommand, from Yonghong.

5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
   id check got erroneously removed, from Björn.

6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.

7) Check tunnel type more accurately in BPF's tunnel collect metadata
   kselftest, from Jian.

8) Fix missing Kconfig fragments for BPF kselftests, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 16 Jun 2018 07:32:04 +0000 (16:32 +0900)]
Merge branch 'afs-proc' of git://git./linux/kernel/git/viro/vfs

Pull AFS updates from Al Viro:
 "Assorted AFS stuff - ended up in vfs.git since most of that consists
  of David's AFS-related followups to Christoph's procfs series"

* 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Optimise callback breaking by not repeating volume lookup
  afs: Display manually added cells in dynamic root mount
  afs: Enable IPv6 DNS lookups
  afs: Show all of a server's addresses in /proc/fs/afs/servers
  afs: Handle CONFIG_PROC_FS=n
  proc: Make inline name size calculation automatic
  afs: Implement network namespacing
  afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
  afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
  proc: Add a way to make network proc files writable
  afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
  afs: Rearrange fs/afs/proc.c to move the show routines up
  afs: Rearrange fs/afs/proc.c by moving fops and open functions down
  afs: Move /proc management functions to the end of the file

6 years agoMerge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 16 Jun 2018 07:21:50 +0000 (16:21 +0900)]
Merge branch 'work.compat' of git://git./linux/kernel/git/viro/vfs

Pull compat updates from Al Viro:
 "Some biarch patches - getting rid of assorted (mis)uses of
  compat_alloc_user_space().

  Not much in that area this cycle..."

* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: simplify compat ioctl handling
  signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
  vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart

6 years agoMerge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 16 Jun 2018 07:11:40 +0000 (16:11 +0900)]
Merge branch 'work.aio' of git://git./linux/kernel/git/viro/vfs

Pull aio fixes from Al Viro:
 "Assorted AIO followups and fixes"

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  eventpoll: switch to ->poll_mask
  aio: only return events requested in poll_mask() for IOCB_CMD_POLL
  eventfd: only return events requested in poll_mask()
  aio: mark __aio_sigset::sigmask const

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 15 Jun 2018 22:39:34 +0000 (07:39 +0900)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Various netfilter fixlets from Pablo and the netfilter team.

 2) Fix regression in IPVS caused by lack of PMTU exceptions on local
    routes in ipv6, from Julian Anastasov.

 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.

 4) Don't crash on poll in TLS, from Daniel Borkmann.

 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
    including Avahi mDNS. From Bart Van Assche.

 6) Missing of_node_put in qcom/emac driver, from Yue Haibing.

 7) We lack checking of the TCP checking in one special case during SYN
    receive, from Frank van der Linden.

 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.

 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.

10) Must grab HW caps before doing quirk checks in stmmac driver, from
    Jose Abreu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  net: stmmac: Run HWIF Quirks after getting HW caps
  neighbour: skip NTF_EXT_LEARNED entries during forced gc
  net: cxgb3: add error handling for sysfs_create_group
  tls: fix waitall behavior in tls_sw_recvmsg
  tls: fix use-after-free in tls_push_record
  l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
  l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  mlxsw: spectrum_router: Align with new route replace logic
  mlxsw: spectrum_router: Allow appending to dev-only routes
  ipv6: Only emit append events for appended routes
  stmmac: added support for 802.1ad vlan stripping
  cfg80211: fix rcu in cfg80211_unregister_wdev
  mac80211: Move up init of TXQs
  mac80211_hwsim: fix module init error paths
  cfg80211: initialize sinfo in cfg80211_get_station
  nl80211: fix some kernel doc tag mistakes
  hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
  rds: avoid unenecessary cong_update in loop transport
  l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
  ...

6 years agoMerge tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Fri, 15 Jun 2018 22:36:39 +0000 (07:36 +0900)]
Merge tag 'modules-for-v4.18' of git://git./linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Minor code cleanup and also allow sig_enforce param to be shown in
  sysfs with CONFIG_MODULE_SIG_FORCE"

* tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Allow to always show the status of modsign
  module: Do not access sig_enforce directly

6 years agoMerge branch 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Jun 2018 21:50:51 +0000 (06:50 +0900)]
Merge branch 'for-linus-4.18-rc1' of git://git./linux/kernel/git/rw/uml

Pull uml updates from Richard Weinberger:
 "Minor updates for UML:

   - fixes for our new vector network driver by Anton

   - initcall cleanup by Alexander

   - We have a new mailinglist, sourceforge.net sucks"

* 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Fix raw interface options
  um: Fix initialization of vector queues
  um: remove uml initcalls
  um: Update mailing list address

6 years agoxdp: Fix handling of devmap in generic XDP
Toshiaki Makita [Thu, 14 Jun 2018 02:07:42 +0000 (11:07 +0900)]
xdp: Fix handling of devmap in generic XDP

Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
the return value type of __devmap_lookup_elem() from struct net_device *
to struct bpf_dtab_netdev * but forgot to modify generic XDP code
accordingly.

Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
net_device is expected, then skb->dev was set to invalid value.

v2:
- Fix compiler warning without CONFIG_BPF_SYSCALL.

Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 15 Jun 2018 21:42:43 +0000 (06:42 +0900)]
Merge tag 'riscv-for-linus-4.18-merge_window' of git://git./linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains some small RISC-V updates I'd like to target for 4.18.

  They are all fairly small this time. Here's a short summary, there's
  more info in the commits/merges:

   - a fix to __clear_user to respect the passed arguments.

   - enough support for the perf subsystem to work with RISC-V's ISA
     defined performance counters.

   - support for sparse and cleanups suggested by it.

   - support for R_RISCV_32 (a relocation, not the 32-bit ISA).

   - some MAINTAINERS cleanups.

   - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
     always present.

  I've given these a simple build+boot test"

* tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
  RISC-V: Handle R_RISCV_32 in modules
  riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
  riscv: add riscv-specific predefines to CHECKFLAGS
  riscv: split the declaration of __copy_user
  riscv: no __user for probe_kernel_address()
  riscv: use NULL instead of a plain 0
  perf: riscv: Add Document for Future Porting Guide
  perf: riscv: preliminary RISC-V support
  MAINTAINERS: Update Albert's email, he's back at Berkeley
  MAINTAINERS: Add myself as a maintainer for SiFive's drivers
  riscv: Fix the bug in memory access fixup code

6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 15 Jun 2018 21:37:04 +0000 (06:37 +0900)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "Mostly the PPC part of the release, but also switching to Arnd's fix
  for the hyperv config issue and a typo fix.

  Main PPC changes:

   - reimplement the MMIO instruction emulation

   - transactional memory support for PR KVM

   - improve radix page table handling"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
  KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
  KVM: x86: fix typo at kvm_arch_hardware_setup comment
  KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
  KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
  KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
  KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
  KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
  KVM: PPC: Book3S PR: Handle additional interrupt types
  KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
  KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
  KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
  KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
  KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
  KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
  KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
  KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
  KVM: PPC: Book3S PR: Add emulation for trechkpt.
  KVM: PPC: Book3S PR: Add emulation for treclaim.
  KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
  KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
  ...

6 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Fri, 15 Jun 2018 21:35:02 +0000 (06:35 +0900)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio, vhost: features, fixes

   - PCI virtual function support for virtio

   - DMA barriers for virtio strong barriers

   - bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: update the comments for transport features
  virtio_pci: support enabling VFs
  vhost: fix info leak due to uninitialized memory
  virtio_ring: switch to dma_XX barriers for rpmsg

6 years agoMerge branch 'bpf-fixes'
Alexei Starovoitov [Fri, 15 Jun 2018 18:14:26 +0000 (11:14 -0700)]
Merge branch 'bpf-fixes'

Daniel Borkmann says:

====================
First one is a panic I ran into while testing the second
one where we got several syzkaller reports. Series here
fixes both.

Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agobpf: reject any prog that failed read-only lock
Daniel Borkmann [Fri, 15 Jun 2018 00:30:48 +0000 (02:30 +0200)]
bpf: reject any prog that failed read-only lock

We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
as well as the BPF image as read-only through bpf_prog_lock_ro(). In
the case any of these would fail we throw a WARN_ON_ONCE() in order to
yell loudly to the log. Perhaps, to some extend, this may be comparable
to an allocation where __GFP_NOWARN is explicitly not set.

Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior
is slightly different compared to any of the other in-kernel set_memory_ro()
users who do not check the return code of set_memory_ro() and friends /at
all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
in BPF this is mandatory hardening step, we want to know whether there
are any issues that would leave both BPF data writable. So it happens
that syzkaller enabled fault injection and it triggered memory allocation
failure deep inside x86's change_page_attr_set_clr() which was triggered
from set_memory_ro().

Now, there are two options: i) leaving everything as is, and ii) reworking
the image locking code in order to have a final checkpoint out of the
central bpf_prog_select_runtime() which probes whether any of the calls
during prog setup weren't successful, and then bailing out with an error.
Option ii) is a better approach since this additional paranoia avoids
altogether leaving any potential W+X pages from BPF side in the system.
Therefore, lets be strict about it, and reject programs in such unlikely
occasion. While testing I noticed also that one bpf_prog_lock_ro()
call was missing on the outer dummy prog in case of calls, e.g. in the
destructor we call bpf_prog_free_deferred() on the main prog where we
try to bpf_prog_unlock_free() the program, and since we go via
bpf_prog_select_runtime() do that as well.

Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agobpf: fix panic in prog load calls cleanup
Daniel Borkmann [Fri, 15 Jun 2018 00:30:47 +0000 (02:30 +0200)]
bpf: fix panic in prog load calls cleanup

While testing I found that when hitting error path in bpf_prog_load()
where we jump to free_used_maps and prog contained BPF to BPF calls
that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
more clear and fix that.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agonet: stmmac: Run HWIF Quirks after getting HW caps
Jose Abreu [Fri, 15 Jun 2018 15:17:27 +0000 (16:17 +0100)]
net: stmmac: Run HWIF Quirks after getting HW caps

Currently we were running HWIF quirks before getting HW capabilities.
This is not right because some HWIF callbacks depend on HW caps.

Lets save the quirks callback and use it in a later stage.

This fixes Altera socfpga.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically select HW Interface")
Reported-by: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoneighbour: skip NTF_EXT_LEARNED entries during forced gc
Roopa Prabhu [Wed, 13 Jun 2018 04:26:10 +0000 (21:26 -0700)]
neighbour: skip NTF_EXT_LEARNED entries during forced gc

Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
added support for NTF_EXT_LEARNED for neighbour entries.
NTF_EXT_LEARNED entries are neigh entries managed by control
plane (eg: Ethernet VPN implementation in FRR routing suite).
Periodic gc already excludes these entries. This patch extends
it to forced gc which the earlier patch missed.

Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: cxgb3: add error handling for sysfs_create_group
Zhouyang Jia [Fri, 15 Jun 2018 03:06:17 +0000 (11:06 +0800)]
net: cxgb3: add error handling for sysfs_create_group

When sysfs_create_group fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling sysfs_create_group.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tls-fixes'
David S. Miller [Fri, 15 Jun 2018 16:14:31 +0000 (09:14 -0700)]
Merge branch 'tls-fixes'

Daniel Borkmann says:

====================
Two tls fixes

First one is syzkaller trigered uaf and second one noticed
while writing test code with tls ulp. For details please see
individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotls: fix waitall behavior in tls_sw_recvmsg
Daniel Borkmann [Fri, 15 Jun 2018 01:07:46 +0000 (03:07 +0200)]
tls: fix waitall behavior in tls_sw_recvmsg

Current behavior in tls_sw_recvmsg() is to wait for incoming tls
messages and copy up to exactly len bytes of data that the user
provided. This is problematic in the sense that i) if no packet
is currently queued in strparser we keep waiting until one has been
processed and pushed into tls receive layer for tls_wait_data() to
wake up and push the decrypted bits to user space. Given after
tls decryption, we're back at streaming data, use sock_rcvlowat()
hint from tcp socket instead. Retain current behavior with MSG_WAITALL
flag and otherwise use the hint target for breaking the loop and
returning to application. This is done if currently no ctx->recv_pkt
is ready, otherwise continue to process it from our strparser
backlog.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotls: fix use-after-free in tls_push_record
Daniel Borkmann [Fri, 15 Jun 2018 01:07:45 +0000 (03:07 +0200)]
tls: fix use-after-free in tls_push_record

syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'
David S. Miller [Fri, 15 Jun 2018 16:12:37 +0000 (09:12 -0700)]
Merge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'

Guillaume Nault says:

====================
l2tp: l2tp_ppp must ignore non-PPP sessions

The original L2TP code was written for version 2 of the protocol, which
could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
it could transport different kinds of pseudo-wires. But parts of the
l2tp_ppp module still break in presence of non-PPP sessions.

Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
l2tp_netlink failed to ensure that (fixed in patch 1).
When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
filter out non-PPP sessions (last occurrence fixed in patch 2).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
Guillaume Nault [Fri, 15 Jun 2018 13:39:19 +0000 (15:39 +0200)]
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()

pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
'session' may be an Ethernet pseudo-wire.

However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
assumes l2tp_session_priv() points to a pppol2tp_session structure. For
an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
structure instead, making pppol2tp_session_ioctl() access invalid
memory.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
Guillaume Nault [Fri, 15 Jun 2018 13:39:17 +0000 (15:39 +0200)]
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels

The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
found there. However, l2tp_netlink accepts creating Ethernet sessions
regardless of the underlying tunnel version.

This confuses pppol2tp_seq_session_show(), which expects that
l2tp_session_priv() returns a pppol2tp_session structure. When the
session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
instead. This leads to invalid memory access when
pppol2tp_session_get_sock() later tries to dereference ps->sk.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-IPv6-and-reference-counting-fixes'
David S. Miller [Fri, 15 Jun 2018 16:11:17 +0000 (09:11 -0700)]
Merge branch 'mlxsw-IPv6-and-reference-counting-fixes'

Ido Schimmel says:

====================
mlxsw: IPv6 and reference counting fixes

The first three patches fix a mismatch between the new IPv6 behavior
introduced in commit f34436a43092 ("net/ipv6: Simplify route replace and
appending into multipath route") and mlxsw. The patches allow the driver
to support multipathing in IPv6 overlays with GRE tunnel devices. A
selftest will be submitted when net-next opens.

The last patch fixes a reference count problem of the port_vlan struct.
I plan to simplify the code in net-next, so that reference counting is
not necessary anymore.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_switchdev: Fix port_vlan refcounting
Petr Machata [Fri, 15 Jun 2018 13:23:38 +0000 (16:23 +0300)]
mlxsw: spectrum_switchdev: Fix port_vlan refcounting

Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:

[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238]  pci_device_remove+0x31/0xb0
[13578.650244]  device_release_driver_internal+0x14f/0x220
[13578.655562]  driver_detach+0x32/0x70
[13578.659183]  bus_remove_driver+0x47/0xa0
[13578.663134]  pci_unregister_driver+0x1e/0x80
[13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
[13578.677888]  ? exit_to_usermode_loop+0x78/0x80
[13578.682374]  do_syscall_64+0x39/0xe0
[13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.

Fixes: b3529af6bb0d ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Align with new route replace logic
Ido Schimmel [Fri, 15 Jun 2018 13:23:37 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Align with new route replace logic

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Allow appending to dev-only routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:36 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Allow appending to dev-only routes

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: Only emit append events for appended routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:35 +0000 (16:23 +0300)]
ipv6: Only emit append events for appended routes

Current code will emit an append event in the FIB notification chain for
any route added with NLM_F_APPEND set, even if the route was not
appended to any existing route.

This is inconsistent with IPv4 where such an event is only emitted when
the new route is appended after an existing one.

Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
handle these events.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Fri, 15 Jun 2018 16:08:26 +0000 (09:08 -0700)]
Merge tag 'mac80211-for-davem-2018-06-15' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A handful of fixes:
 * missing RCU grace period enforcement led to drivers freeing
   data structures before; fix from Dedy Lansky.
 * hwsim module init error paths were messed up; fixed it myself
   after a report from Colin King (who had sent a partial patch)
 * kernel-doc tag errors; fix from Luca Coelho
 * initialize the on-stack sinfo data structure when getting
   station information; fix from Sven Eckelmann
 * TXQ state dumping is now done from init, and when TXQs aren't
   initialized yet at that point, bad things happen, move the
   initialization; fix from Toke Høiland-Jørgensen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agostmmac: added support for 802.1ad vlan stripping
Elad Nachman [Fri, 15 Jun 2018 06:57:39 +0000 (09:57 +0300)]
stmmac: added support for 802.1ad vlan stripping

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with
802.1Q protocol, and assigns ETH_P_8021Q to the skb by hard-coding
the parameter on call to __vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created
with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into
__vlan_hwaccel_put_tag() instead of using the hard-coded value of
ETH_P_8021Q.

NETIF_F_HW_VLAN_STAG_RX check was added and the strip action is now
dependent on the correct combination of features and the detected vlan tag.

NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver
actual abilities.

Signed-off-by: Elad Nachman <eladn@gilat.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoafs: Optimise callback breaking by not repeating volume lookup
David Howells [Fri, 15 Jun 2018 14:24:50 +0000 (15:24 +0100)]
afs: Optimise callback breaking by not repeating volume lookup

At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server->cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Display manually added cells in dynamic root mount
David Howells [Fri, 15 Jun 2018 14:19:22 +0000 (15:19 +0100)]
afs: Display manually added cells in dynamic root mount

Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].

To this end:

 (1) Only one dynamic root superblock is now created per network namespace
     and this is shared between all attempts to mount it.  This makes it
     easier to find the superblock to modify.

 (2) When a dynamic root superblock is created, the list of cells is walked
     and directories created for each cell already defined.

 (3) When a new cell is added, if a dynamic root superblock exists, a
     directory is created for it.

 (4) When a cell is destroyed, the directory is removed.

 (5) These directories are created by calling lookup_one_len() on the root
     dir which automatically creates them if they don't exist.

[*] Inasmuch as network namespaces are currently supported here.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Enable IPv6 DNS lookups
David Howells [Fri, 15 Jun 2018 14:19:10 +0000 (15:19 +0100)]
afs: Enable IPv6 DNS lookups

Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses
from being looked up.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agocfg80211: fix rcu in cfg80211_unregister_wdev
Dedy Lansky [Fri, 15 Jun 2018 11:05:01 +0000 (13:05 +0200)]
cfg80211: fix rcu in cfg80211_unregister_wdev

Callers of cfg80211_unregister_wdev can free the wdev object
immediately after this function returns. This may crash the kernel
because this wdev object is still in use by other threads.
Add synchronize_rcu() after list_del_rcu to make sure wdev object can
be safely freed.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
6 years agomac80211: Move up init of TXQs
Toke Høiland-Jørgensen [Fri, 25 May 2018 12:29:21 +0000 (14:29 +0200)]
mac80211: Move up init of TXQs

On init, ieee80211_if_add() dumps the interface. Since that now includes a
dump of the TXQ state, we need to initialise that before the dump happens.
So move up the TXQ initialisation to to before the call to
ieee80211_if_add().

Fixes: 52539ca89f36 ("cfg80211: Expose TXQ stats and parameters to userspace")
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
6 years agomac80211_hwsim: fix module init error paths
Johannes Berg [Tue, 29 May 2018 10:04:51 +0000 (12:04 +0200)]
mac80211_hwsim: fix module init error paths

We didn't free the workqueue on any errors, nor did we
correctly check for rhashtable allocation errors, nor
did we free the hashtable on error.

Reported-by: Colin King <colin.king@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
6 years agocfg80211: initialize sinfo in cfg80211_get_station
Sven Eckelmann [Wed, 6 Jun 2018 08:53:55 +0000 (10:53 +0200)]
cfg80211: initialize sinfo in cfg80211_get_station

Most of the implementations behind cfg80211_get_station will not initialize
sinfo to zero before manipulating it. For example, the member "filled",
which indicates the filled in parts of this struct, is often only modified
by enabling certain bits in the bitfield while keeping the remaining bits
in their original state. A caller without a preinitialized sinfo.filled can
then no longer decide which parts of sinfo were filled in by
cfg80211_get_station (or actually the underlying implementations).

cfg80211_get_station must therefore take care that sinfo is initialized to
zero. Otherwise, the caller may tries to read information which was not
filled in and which must therefore also be considered uninitialized. In
batadv_v_elp_get_throughput's case, an invalid "random" expected throughput
may be stored for this neighbor and thus the B.A.T.M.A.N V algorithm may
switch to non-optimal neighbors for certain destinations.

Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Reported-by: Thomas Lauer <holminateur@gmail.com>
Reported-by: Marcel Schmidt <ff.z-casparistrasse@mailbox.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
6 years agonl80211: fix some kernel doc tag mistakes
Luca Coelho [Fri, 8 Jun 2018 07:04:47 +0000 (10:04 +0300)]
nl80211: fix some kernel doc tag mistakes

There is a bunch of tags marking constants with &, which means struct
or enum name.  Replace them with %, which is the correct tag for
constants.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
6 years agoMerge tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 15 Jun 2018 08:26:29 +0000 (17:26 +0900)]
Merge tag 'linux-kselftest-4.18-rc1-2' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull more Kselftest updates from Shuah Khan:

 - fix a signedness bug in cgroups test

 - add ppc support for kprobe args tests

* tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest/cgroup: fix a signedness bug
  selftests/ftrace: Add ppc support for kprobe args tests

6 years agoMerge tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 15 Jun 2018 08:24:40 +0000 (17:24 +0900)]
Merge tag 'sound-fix-4.18-rc1' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here is a collection of small fixes on top of the previous update.

  All small and obvious fixes. Mostly for usual suspects, USB-audio and
  HD-audio, but a few trivial error handling fixes for misc drivers as
  well"

* tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Always create the interrupt pipe for the mixer
  ALSA: usb-audio: Add insertion control for UAC3 BADD
  ALSA: usb-audio: Change in connectors control creation interface
  ALSA: usb-audio: Add bi-directional terminal types
  ALSA: lx6464es: add error handling for pci_ioremap_bar
  ALSA: sonicvibes: add error handling for snd_ctl_add
  ALSA: usb-audio: Remove explicitly listed Mytek devices
  ALSA: usb-audio: Generic DSD detection for XMOS-based implementations
  ALSA: usb-audio: Add native DSD support for Mytek DACs
  ALSA: hda/realtek - Add shutup hint
  ALSA: usb-audio: Disable the quirk for Nura headset
  ALSA: hda: add dock and led support for HP ProBook 640 G4
  ALSA: hda: add dock and led support for HP EliteBook 830 G5
  ALSA: emu10k1: add error handling for snd_ctl_add
  ALSA: fm801: add error handling for snd_ctl_add

6 years agoMerge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Jun 2018 08:20:53 +0000 (17:20 +0900)]
Merge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm

Pull amd drm fixes from Dave Airlie:
 "Just a single set of AMD fixes for stuff in -next for -rc1"

* tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm: (47 commits)
  drm/amd/powerplay: Set higher SCLK&MCLK frequency than dpm7 in OD (v2)
  drm/amd/powerplay: remove uncessary extra gfxoff control call
  drm/amdgpu: fix parsing indirect register list v2
  drm/amd/include: Update df 3.6 mask and shift definition
  drm/amd/pp: Fix OD feature enable failed on Vega10 workstation cards
  drm/amd/display: Fix stale buffer object (bo) use
  drm/amd/pp: initialize result to before or'ing in data
  drm/amd/powerplay: fix wrong clock adjust sequence
  drm/amdgpu: Grab/put runtime PM references in atomic_commit_tail()
  drm/amd/powerplay: fix missed hwmgr check warning before call gfx_off_control handler
  drm/amdgpu: fix CG enabling hang with gfxoff enabled
  drm/amdgpu: fix clear_all and replace handling in the VM (v2)
  drm/amdgpu: add checking for sos version
  drm/amdgpu: fix the missed vcn fw version report
  Revert "drm/amdgpu: Add an ATPX quirk for hybrid laptop"
  drm/amdgpu/df: fix potential array out-of-bounds read
  drm/amdgpu: Fix NULL pointer when load kfd driver with PP block is disabled
  drm/gfx9: Update gc goldensetting for vega20.
  drm/amd/pp: Allow underclocking when od table is empty in vbios
  drm/amdgpu/display: check if ppfuncs exists before using it
  ...

6 years agoafs: Show all of a server's addresses in /proc/fs/afs/servers
David Howells [Sat, 2 Jun 2018 21:20:31 +0000 (22:20 +0100)]
afs: Show all of a server's addresses in /proc/fs/afs/servers

Show all of a server's addresses in /proc/fs/afs/servers, placing the
second plus addresses on padded lines of their own.  The current address is
marked with a star.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Handle CONFIG_PROC_FS=n
David Howells [Tue, 5 Jun 2018 09:54:24 +0000 (10:54 +0100)]
afs: Handle CONFIG_PROC_FS=n

The AFS filesystem depends at the moment on /proc for configuration and
also presents information that way - however, this causes a compilation
failure if procfs is disabled.

Fix it so that the procfs bits aren't compiled in if procfs is disabled.

This means that you can't configure the AFS filesystem directly, but it is
still usable provided that an up-to-date keyutils is installed to look up
cells by SRV or AFSDB DNS records.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoproc: Make inline name size calculation automatic
David Howells [Wed, 13 Jun 2018 18:43:19 +0000 (19:43 +0100)]
proc: Make inline name size calculation automatic

Make calculation of the size of the inline name in struct proc_dir_entry
automatic, rather than having to manually encode the numbers and failing to
allow for lockdep.

Require a minimum inline name size of 33+1 to allow for names that look
like two hex numbers with a dash between.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoorangefs: simplify compat ioctl handling
Al Viro [Sun, 27 May 2018 12:52:48 +0000 (08:52 -0400)]
orangefs: simplify compat ioctl handling

no need to mess with copy_in_user(), etc...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agosignalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
Al Viro [Sun, 27 May 2018 12:35:50 +0000 (08:35 -0400)]
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agohv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
Haiyang Zhang [Fri, 15 Jun 2018 01:29:09 +0000 (18:29 -0700)]
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload

These fields in struct ndis_ipsecv2_offload and struct ndis_rsc_offload
are one byte according to the specs. This patch defines them with the
right size. These structs are not in use right now, but will be used soon.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: avoid unenecessary cong_update in loop transport
Santosh Shilimkar [Thu, 14 Jun 2018 18:52:34 +0000 (11:52 -0700)]
rds: avoid unenecessary cong_update in loop transport

Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Fri, 15 Jun 2018 01:32:23 +0000 (11:32 +1000)]
Merge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm-next

Fixes for 4.18. Highlights:
- Fixes for gfxoff on Raven
- Remove an ATPX quirk now that the root cause is fixed
- Runtime PM fixes
- Vega20 register header update
- Wattman fixes
- Misc bug fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180614141428.2909-1-alexander.deucher@amd.com
6 years agobpf, selftests: delete xfrm tunnel when test exits.
William Tu [Thu, 14 Jun 2018 12:01:06 +0000 (05:01 -0700)]
bpf, selftests: delete xfrm tunnel when test exits.

Make the printting of bpf xfrm tunnel better and
cleanup xfrm state and policy when xfrm test finishes.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf, selftest: check tunnel type more accurately
Jian Wang [Fri, 15 Jun 2018 01:22:17 +0000 (03:22 +0200)]
bpf, selftest: check tunnel type more accurately

Grep tunnel type directly to make sure 'ip' command supports it.

Signed-off-by: Jian Wang <jianjian.wang1@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'bpf-misc-fixes'
Daniel Borkmann [Fri, 15 Jun 2018 01:13:18 +0000 (03:13 +0200)]
Merge branch 'bpf-misc-fixes'

Jakub Kicinski says:

====================
This small series allows test_offload.py selftest to run on modern
distributions which may create BPF programs for cgroups at boot,
like Ubuntu 18.04.  We still expect the program list to not be
altered by any other agent while the test is running, but no longer
depend on there being no BPF programs at all at the start.

Fixing the test revealed a small problem with bpftool, which doesn't
report the program load time very accurately.  Because nanoseconds
were not taken into account reported load time would fluctuate by
1 second.  First patch of the series takes care of fixing that.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoselftests/bpf: test offloads even with BPF programs present
Jakub Kicinski [Thu, 14 Jun 2018 18:06:56 +0000 (11:06 -0700)]
selftests/bpf: test offloads even with BPF programs present

Modern distroes increasingly make use of BPF programs.  Default
Ubuntu 18.04 installation boots with a number of cgroup_skb
programs loaded.

test_offloads.py tries to check if programs and maps are not
leaked on error paths by confirming the list of programs on the
system is empty between tests.

Since we can no longer expect the system to have no BPF objects
at boot try to remember the programs and maps present at the start,
and skip those when scanning the system.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: improve accuracy of load time
Jakub Kicinski [Thu, 14 Jun 2018 18:06:55 +0000 (11:06 -0700)]
tools: bpftool: improve accuracy of load time

BPF program load time is reported from the kernel relative to boot time.
If conversion to wall clock does not take nanosecond parts into account,
the load time reported by bpftool may differ by one second from run to
run.  This means JSON object reported by bpftool for a program will
randomly change.

Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'l2tp-fixes'
David S. Miller [Fri, 15 Jun 2018 00:10:19 +0000 (17:10 -0700)]
Merge branch 'l2tp-fixes'

Guillaume Nault says:

====================
l2tp: pppol2tp_connect() fixes

This series fixes a few remaining issues with pppol2tp_connect().

It doesn't try to prevent invalid configurations that have no effect on
kernel's reliability. That would be work for a future patch set.

Patch 2 is the most important as it avoids an invalid pointer
dereference crashing the kernel. It depends on patch 1 for correctly
identifying L2TP session types.

Patches 3 and 4 avoid creating stale tunnels and sessions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: clean up stale tunnel or session in pppol2tp_connect's error path
Guillaume Nault [Wed, 13 Jun 2018 13:09:21 +0000 (15:09 +0200)]
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path

pppol2tp_connect() may create a tunnel or a session. Remove them in
case of error.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: prevent pppol2tp_connect() from creating kernel sockets
Guillaume Nault [Wed, 13 Jun 2018 13:09:20 +0000 (15:09 +0200)]
l2tp: prevent pppol2tp_connect() from creating kernel sockets

If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using
the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets
the relevant fields to zero, tricking l2tp_tunnel_create() into setting
up an unusable kernel socket.

We can't set 'tcfg' with the required fields because there's no way to
get them from the current connect() parameters. So let's restrict
kernel sockets creation to the netlink API, which is the original use
case.

Fixes: 789a4a2c61d8 ("l2tp: Add support for static unmanaged L2TPv3 tunnels")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: only accept PPP sessions in pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:19 +0000 (15:09 +0200)]
l2tp: only accept PPP sessions in pppol2tp_connect()

l2tp_session_priv() returns a struct pppol2tp_session pointer only for
PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH
pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess
structure, which is much smaller than struct pppol2tp_session. This
leads to invalid memory dereference when trying to lock ps->sk_lock.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:18 +0000 (15:09 +0200)]
l2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()

Define cfg.pw_type so that the new session is created with its .pwtype
field properly set (L2TP_PWTYPE_PPP).

Not setting the pseudo-wire type had several annoying effects:

  * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when
    dumping sessions with the netlink API.

  * Impossibility to delete the session using the netlink API (because
    l2tp_nl_cmd_session_delete() gets the deletion callback function
    from an array indexed by the session's pseudo-wire type).

Also, there are several cases where we should check a session's
pseudo-wire type. For example, pppol2tp_connect() should refuse to
connect a session that is not PPPoL2TP, but that requires the session's
.pwtype field to be properly set.

Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoeventpoll: switch to ->poll_mask
Ben Noordhuis [Thu, 14 Jun 2018 22:32:07 +0000 (00:32 +0200)]
eventpoll: switch to ->poll_mask

Signed-off-by: Ben Noordhuis <info@bnoordhuis.nl>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoaio: only return events requested in poll_mask() for IOCB_CMD_POLL
Christoph Hellwig [Mon, 11 Jun 2018 06:50:10 +0000 (08:50 +0200)]
aio: only return events requested in poll_mask() for IOCB_CMD_POLL

The ->poll_mask() operation has a mask of events that the caller
is interested in, but not all implementations might take it into
account.  Mask the return value to only the requested events,
similar to what the poll and epoll code does.

Reported-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoMerge branch 'emaclite-fixes'
David S. Miller [Fri, 15 Jun 2018 00:08:04 +0000 (17:08 -0700)]
Merge branch 'emaclite-fixes'

Radhey Shyam Pandey says:

====================
emaclite bug fixes and code cleanup

This patch series fixes bug in emaclite remove and mdio_setup routines.
It does minor code cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Remove xemaclite_mdio_setup return check
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:19 +0000 (12:05 +0530)]
net: emaclite: Remove xemaclite_mdio_setup return check

Errors are already reported in xemaclite_mdio_setup so avoid
reporting it again.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Remove unused 'has_mdio' flag.
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:18 +0000 (12:05 +0530)]
net: emaclite: Remove unused 'has_mdio' flag.

Remove unused 'has_mdio' flag.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Fix MDIO bus unregister bug
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:17 +0000 (12:05 +0530)]
net: emaclite: Fix MDIO bus unregister bug

Since 'has_mdio' flag is not used,sequence insmod->rmmod-> insmod
leads to failure as MDIO unregister doesn't happen in .remove().
Fix it by checking MII bus pointer instead.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Fix position of lp->mii_bus assignment
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:16 +0000 (12:05 +0530)]
net: emaclite: Fix position of lp->mii_bus assignment

To ensure MDIO bus is not double freed in remove() path
assign lp->mii_bus after MDIO bus registration.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoeventfd: only return events requested in poll_mask()
Avi Kivity [Fri, 8 Jun 2018 19:12:32 +0000 (22:12 +0300)]
eventfd: only return events requested in poll_mask()

The ->poll_mask() operation has a mask of events that the caller
is interested in, but we're returning all events regardless.

Change to return only the events the caller is interested in. This
fixes aio IO_CMD_POLL returning immediately when called with POLLIN
on an eventfd, since an eventfd is almost always ready for a write.

Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoaio: mark __aio_sigset::sigmask const
Avi Kivity [Fri, 8 Jun 2018 14:55:05 +0000 (17:55 +0300)]
aio: mark __aio_sigset::sigmask const

io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agotcp: verify the checksum of the first data segment in a new connection
Frank van der Linden [Tue, 12 Jun 2018 23:09:37 +0000 (23:09 +0000)]
tcp: verify the checksum of the first data segment in a new connection

commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.

But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly.  These lower-level processing functions do not do any checksum
verification.

Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>