Arun KS [Tue, 5 Mar 2019 23:42:14 +0000 (15:42 -0800)]
mm/page_alloc.c: memory hotplug: free pages as higher order
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced. With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times. Modify
external providers of online callback to align with the change.
[arunks@codeaurora.org: v11]
Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:42:10 +0000 (15:42 -0800)]
mm/slub.c: remove an unused addr argument
"addr" function argument is not used in alloc_consistency_checks() at
all, so remove it.
Link: http://lkml.kernel.org/r/20190211123214.35592-1-cai@lca.pw
Fixes: becfda68abca ("slub: convert SLAB_DEBUG_FREE to SLAB_CONSISTENCY_CHECKS")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Tue, 5 Mar 2019 23:42:07 +0000 (15:42 -0800)]
include/linux/slub_def.h: comment fixes
Capitialize comment string, use C89 comment style, correct
grammar/punctuation in comments.
Link: http://lkml.kernel.org/r/20190204005713.9463-2-tobin@kernel.org
Link: http://lkml.kernel.org/r/20190204005713.9463-3-tobin@kernel.org
Link: http://lkml.kernel.org/r/20190204005713.9463-4-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:42:03 +0000 (15:42 -0800)]
mm/slab.c: kmemleak no scan alien caches
Kmemleak throws endless warnings during boot due to in
__alloc_alien_cache(),
alc = kmalloc_node(memsize, gfp, node);
init_arraycache(&alc->ac, entries, batch);
kmemleak_no_scan(ac);
Kmemleak does not track the array cache (alc->ac) but the alien cache
(alc) instead, so let it track the latter by lifting kmemleak_no_scan()
out of init_arraycache().
There is another place that calls init_arraycache(), but
alloc_kmem_cache_cpus() uses the percpu allocation where will never be
considered as a leak.
kmemleak: Found object by alias at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
lookup_object+0x84/0xac
find_and_get_object+0x84/0xe4
kmemleak_no_scan+0x74/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
kmemleak: Object 0xffff8007b9aa7e00 (size 256):
kmemleak: comm "swapper/0", pid 1, jiffies
4294697137
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
kmemleak_alloc+0x84/0xb8
kmem_cache_alloc_node_trace+0x31c/0x3a0
__kmalloc_node+0x58/0x78
setup_kmem_cache_node+0x26c/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
kmemleak_no_scan+0x90/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
Fixes: 1fe00d50a9e8 ("slab: factor out initialization of array cache")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peng Wang [Tue, 5 Mar 2019 23:42:00 +0000 (15:42 -0800)]
mm/slub.c: freelist is ensured to be NULL when new_slab() fails
new_slab_objects() will return immediately if freelist is not NULL.
if (freelist)
return freelist;
One more assignment operation could be avoided.
Link: http://lkml.kernel.org/r/20181229062512.30469-1-rocking@whu.edu.cn
Signed-off-by: Peng Wang <rocking@whu.edu.cn>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shuriyc Chu [Tue, 5 Mar 2019 23:41:56 +0000 (15:41 -0800)]
fs/file.c: initialize init_files.resize_wait
(Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)
'get_unused_fd_flags' in kthread cause kernel crash. It works fine on
4.1, but causes crash after get 64 fds. It also cause crash on
ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
same.
The crash message on centos7.5 shows below:
start fd 61
start fd 62
start fd 63
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: __wake_up_common+0x2e/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
task:
ffff8e92b9431fa0 ti:
ffff8e94247a0000 task.ti:
ffff8e94247a0000
RIP: 0010:__wake_up_common+0x2e/0x90
RSP: 0018:
ffff8e94247a2d18 EFLAGS:
00010086
RAX:
0000000000000000 RBX:
ffffffff9d09daa0 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000003 RDI:
ffffffff9d09daa0
RBP:
ffff8e94247a2d50 R08:
0000000000000000 R09:
ffff8e92b95dfda8
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffffff9d09daa8
R13:
0000000000000003 R14:
0000000000000000 R15:
0000000000000003
FS:
0000000000000000(0000) GS:
ffff8e9434e80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
000000017c686000 CR4:
00000000000207e0
Call Trace:
__wake_up+0x39/0x50
expand_files+0x131/0x250
__alloc_fd+0x47/0x170
get_unused_fd_flags+0x30/0x40
test_fd+0x12a/0x1c0 [test]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
RIP __wake_up_common+0x2e/0x90
RSP <
ffff8e94247a2d18>
CR2:
0000000000000000
This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
(3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not
initialized before being used.
Reported-by: Richard Zhang <zhang.zijian@h3c.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vineet Gupta [Tue, 5 Mar 2019 23:41:52 +0000 (15:41 -0800)]
fs/inode.c: inode_set_flags(): replace opencoded set_mask_bits()
It seems that commits
5f16f3225b0624 and
00a1a053ebe5, both with same
commitlog ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()")
introduced the set_mask_bits API, but somehow missed not using it in ext4
in the end.
Also, set_mask_bits() is used in fs quite a bit and we can possibly come
up with a generic llsc based implementation (w/o the cmpxchg loop)
Link: http://lkml.kernel.org/r/1548275584-18096-3-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo A. R. Silva [Tue, 5 Mar 2019 23:41:48 +0000 (15:41 -0800)]
ocfs2: Use zero-sized array and struct_size() in kzalloc()
Update the code to use a zero-sized array instead of a pointer in
structure ocfs2_slot_info and use struct_size() in kzalloc().
Notice that one of the more common cases of allocation size calculations
is finding the size of a structure that has a zero-sized array at the
end, along with memory for some number of elements for that array. For
example:
struct foo {
int stuff;
void *entry[];
};
instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190108191903.GA22056@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gang He [Tue, 5 Mar 2019 23:41:45 +0000 (15:41 -0800)]
ocfs2: fix the application IO timeout when fstrim is running
The user reported this problem, the upper application IO was timeout
when fstrim was running on this ocfs2 partition. the application
monitoring resource agent considered that this application did not work,
then this node was fenced by the cluster brain (e.g. pacemaker).
The root cause is that fstrim thread always holds main_bm meta-file
related locks until all the cluster groups are trimmed. This patch will
make fstrim thread release main_bm meta-file related locks when each
cluster group is trimmed, this will let the current application IO has a
chance to claim the clusters from main_bm meta-file.
Link: http://lkml.kernel.org/r/20190111090014.31645-1-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jia Guo [Tue, 5 Mar 2019 23:41:41 +0000 (15:41 -0800)]
ocfs2: fix a panic problem caused by o2cb_ctl
In the process of creating a node, it will cause NULL pointer
dereference in kernel if o2cb_ctl failed in the interval (mkdir,
o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
The node num is initialized to 0 in function o2nm_node_group_make_item,
o2nm_node_group_drop_item will mistake the node number 0 for a valid
node number when we delete the node before the node number is set
correctly. If the local node number of the current host happens to be
0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
o2hb_thread still running. The panic stack is generated as follows:
o2hb_thread
\-o2hb_do_disk_heartbeat
\-o2hb_check_own_slot
|-slot = ®->hr_slots[o2nm_this_node()];
//o2nm_this_node() return O2NM_INVALID_NODE_NUM
We need to check whether the node number is set when we delete the node.
Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Firoz Khan [Tue, 5 Mar 2019 23:41:37 +0000 (15:41 -0800)]
sh: remove nargs from __SYSCALL
The __SYSCALL macro's arguments are system call number, system call
entry name and number of arguments for the system call.
Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor
used anywhere. So it would be better to keep the implementation as
__SYSCALL(nr, entry). This unifies the implementation with some other
architectures too.
Link: http://lkml.kernel.org/r/1546443445-21075-2-git-send-email-firoz.khan@linaro.org
Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Tue, 5 Mar 2019 23:41:34 +0000 (15:41 -0800)]
scripts/decode_stacktrace.sh: handle RIP address with segment
decode line:
RIP: 0010:khugepaged+0x2a2/0x2280
into
RIP: 0010:khugepaged (mm/khugepaged.c:1885)
Link: http://lkml.kernel.org/r/154660071227.52726.15645307951282727605.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Tue, 5 Mar 2019 23:41:31 +0000 (15:41 -0800)]
kasan: fix coccinelle warnings in kasan_p*_table
kasan_p4d_table(), kasan_pmd_table() and kasan_pud_table() are declared
as returning bool, but return 0 instead of false, which produces a
coccinelle warning. Fix it.
Link: http://lkml.kernel.org/r/1fa6fadf644859e8a6a8ecce258444b49be8c7ee.1551716733.git.andreyknvl@google.com
Fixes: 0207df4fa1a8 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 5 Mar 2019 23:41:27 +0000 (15:41 -0800)]
kasan: fix kasan_check_read/write definitions
Building little-endian allmodconfig kernels on arm64 started failing
with the generated atomic.h implementation, since we now try to call
kasan helpers from the EFI stub:
aarch64-linux-gnu-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: in function `atomic_set':
include/generated/atomic-instrumented.h:44: undefined reference to `__efistub_kasan_check_write'
I suspect that we get similar problems in other files that explicitly
disable KASAN for some reason but call atomic_t based helper functions.
We can fix this by checking the predefined __SANITIZE_ADDRESS__ macro
that the compiler sets instead of checking CONFIG_KASAN, but this in
turn requires a small hack in mm/kasan/common.c so we do see the extern
declaration there instead of the inline function.
Link: http://lkml.kernel.org/r/20181211133453.2835077-1-arnd@arndb.de
Fixes: b1864b828644 ("locking/atomics: build atomic headers as required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:41:24 +0000 (15:41 -0800)]
page_poison: play nicely with KASAN
KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
It triggers false positives in the allocation path:
BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
Read of size 8 at addr
ffff88881f800000 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
__asan_report_load8_noabort+0x19/0x20
memchr_inv+0x2ea/0x330
kernel_poison_pages+0x103/0x3d5
get_page_from_freelist+0x15e7/0x4d90
because KASAN has not yet unpoisoned the shadow page for allocation
before it checks memchr_inv() but only found a stale poison pattern.
Also, false positives in free path,
BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
Write of size 4096 at addr
ffff8888112cc000 by task swapper/0/1
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
check_memory_region+0x22d/0x250
memset+0x28/0x40
kernel_poison_pages+0x29e/0x3d5
__free_pages_ok+0x75f/0x13e0
due to KASAN adds poisoned redzones around slab objects, but the page
poisoning needs to poison the whole page.
Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 5 Mar 2019 23:41:20 +0000 (15:41 -0800)]
kasan: remove use after scope bugs detection.
Use after scope bugs detector seems to be almost entirely useless for
the linux kernel. It exists over two years, but I've seen only one
valid bug so far [1]. And the bug was fixed before it has been
reported. There were some other use-after-scope reports, but they were
false-positives due to different reasons like incompatibility with
structleak plugin.
This feature significantly increases stack usage, especially with GCC <
9 version, and causes a 32K stack overflow. It probably adds
performance penalty too.
Given all that, let's remove use-after-scope detector entirely.
While preparing this patch I've noticed that we mistakenly enable
use-after-scope detection for clang compiler regardless of
CONFIG_KASAN_EXTRA setting. This is also fixed now.
[1] http://lkml.kernel.org/r/<
20171129052106.rhgbjhhis53hkgfn@wfg-t540p.sh.intel.com>
Link: http://lkml.kernel.org/r/20190111185842.13978-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Will Deacon <will.deacon@arm.com> [arm64]
Cc: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhongjiang [Tue, 5 Mar 2019 23:41:16 +0000 (15:41 -0800)]
mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
When soft_offline_in_use_page() runs on a thp tail page after pmd is
split, we trigger the following VM_BUG_ON_PAGE():
Memory failure: 0x3755ff: non anonymous thp
__get_any_page: 0x3755ff: unknown zero refcount page type
2fffff80000000
Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
page:
ffffea000d360140 count:0 mapcount:0 mapping:
0000000000000000 index:0x1
flags: 0x2fffff80000000()
raw:
002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
raw:
0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
------------[ cut here ]------------
kernel BUG at ./include/linux/mm.h:519!
soft_offline_in_use_page() passed refcount and page lock from tail page
to head page, which is not needed because we can pass any subpage to
split_huge_page().
Naoya had fixed a similar issue in
c3901e722b29 ("mm: hwpoison: fix thp
split handling in memory_failure()"). But he missed fixing soft
offline.
Link: http://lkml.kernel.org/r/1551452476-24000-1-git-send-email-zhongjiang@huawei.com
Fixes: 61f5d698cc97 ("mm: re-enable THP")
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Mar 2019 19:28:25 +0000 (11:28 -0800)]
Merge tag 'mips_5.1' of git://git./linux/kernel/git/mips/linux
Pull MIPS updates from Paul Burton:
- Support for the MIPSr6 MemoryMapID register & Global INValidate TLB
(GINVT) instructions, allowing for more efficient TLB maintenance
when running on a CPU such as the I6500 that supports these.
- Enable huge page support for MIPS64r6.
- Optimize post-DMA cache sync by removing that code entirely for
kernel configurations in which we know it won't be needed.
- The number of pages allocated for interrupt stacks is now calculated
correctly, where before we would wastefully allocate too much memory
in some configurations.
- The ath79 platform migrates to devicetree.
- The bcm47xx platform sees fixes for the Buffalo WHR-G54S board.
- The ingenic/jz4740 platform gains support for appended devicetrees.
- The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see
cleanups as do various pieces of core architecture code.
* tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (66 commits)
MIPS: lantiq: Remove separate GPHY Firmware loader
MIPS: ingenic: Add support for appended devicetree
MIPS: SGI-IP27: rework HUB interrupts
MIPS: SGI-IP27: do boot CPU init later
MIPS: SGI-IP27: do xtalk scanning later
MIPS: SGI-IP27: use pr_info/pr_emerg and pr_cont to fix output
MIPS: SGI-IP27: clean up bridge access and header files
MIPS: SGI-IP27: get rid of volatile and hubreg_t
MIPS: irq: Allocate accurate order pages for irq stack
MIPS: dma-noncoherent: Remove bogus condition in dma_sync_phys()
MIPS: eBPF: Remove REG_32BIT_ZERO_EX
MIPS: eBPF: Always return sign extended 32b values
MIPS: CM: Fix indentation
MIPS: BCM47XX: Fix/improve Buffalo WHR-G54S support
MIPS: OCTEON: program rx/tx-delay always from DT
MIPS: OCTEON: delete board-specific link status
MIPS: OCTEON: don't lie about interface type of CN3005 board
MIPS: OCTEON: warn if deprecated link status is being used
MIPS: OCTEON: add fixed-link nodes to in-kernel device tree
MIPS: Delete unused flush_cache_sigtramp()
...
Linus Torvalds [Tue, 5 Mar 2019 19:17:23 +0000 (11:17 -0800)]
Merge branch 'parisc-5.1-1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"The most important changes in this patch set are:
- DMA-related cleanups for parisc with the aim to move anything not
required by drivers out of <asm/dma-mapping.h>, by Christoph
Hellwig
- Switch to memblock_alloc(), by Mike Rapoport
- Makefile cleanups by Masahiro Yamada
- Switch to bust_spinlocks(), by Sergey Senozhatsky
- Improved initial SMP affinity selection for IRQs
- Added IPI- and rescheduling interrupts in /proc/interrupts output"
* 'parisc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (21 commits)
parisc: use memblock_alloc() instead of custom get_memblock()
parisc: Add constants for various PDC firmware calls
parisc: Add constant for PDC_PAT_COMPLEX firmware call
parisc: Show machine product number during boot
parisc: Add constants for PDC_RELOCATE PDC call
parisc: Add PDC_CRASH_PREP PDC function number
parisc: Use F_EXTEND() macro in iosapic code
parisc: remove the HBA_DATA macro
parisc/lba_pci: use container_of in LBA_DEV
parisc/dino: use container_of in DINO_DEV
parisc: properly type the return value of parisc_walk_tree
parisc: properly type the iommu field in struct pci_hba_data
parisc: turn GET_IOC into an inline function
parisc: move internal implementation details out of <asm/dma-mapping.h>
parisc: don't include <asm/cacheflush.h> in <asm/dma-mapping.h>
parisc: remove meaningless ccflags-y in arch/parisc/boot/Makefile
parisc: replace oops_in_progress manipulation with bust_spinlocks()
parisc: Improve initial IRQ to CPU assignment
parisc: Count IPI function call interrupts
parisc: Show rescheduling interrupts on SMP machines only
...
Linus Torvalds [Tue, 5 Mar 2019 19:13:10 +0000 (11:13 -0800)]
Merge tag 's390-5.1-1' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
Linus Torvalds [Tue, 5 Mar 2019 19:02:12 +0000 (11:02 -0800)]
Merge tag 'm68k-for-v5.1-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- VLA removal
- gcc-8.x build fixes
- small improvements and cleanups
- defconfig updates
* tag 'm68k-for-v5.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Add -ffreestanding to CFLAGS
m68k/apollo: Fix comment in Makefile
dio: Fix buffer overflow in case of unknown board
m68k/defconfig: Update defconfigs for v5.0-rc1
m68k/atari: Avoid VLA use in atari_switches_setup()
m68k: Avoid VLA use in mangle_kernel_stack()
m68k/mac: Use '030 reset method on SE/30
m68k/mac: Remove obsolete comment
m68k/mac: Skip VIA port setup unless RTC is connected
m68k/mac: Clean up unused timer definitions
m68k/defconfig: Drop NET_VENDOR_<FOO>=n
Borislav Petkov [Tue, 5 Mar 2019 14:47:51 +0000 (15:47 +0100)]
x86: Deprecate a.out support
Linux supports ELF binaries for ~25 years now. a.out coredumping has
bitrotten quite significantly and would need some fixing to get it into
shape again but considering how even the toolchains cannot create a.out
executables in its default configuration, let's deprecate a.out support
and remove it a couple of releases later, instead.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jann Horn <jannh@google.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <x86@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Mar 2019 18:00:35 +0000 (10:00 -0800)]
a.out: remove core dumping support
We're (finally) phasing out a.out support for good. As Borislav Petkov
points out, we've supported ELF binaries for about 25 years by now, and
coredumping in particular has bitrotted over the years.
None of the tool chains even support generating a.out binaries any more,
and the plan is to deprecate a.out support entirely for the kernel. But
I want to start with just removing the core dumping code, because I can
still imagine that somebody actually might want to support a.out as a
simpler biinary format.
Particularly if you generate some random binaries on the fly, ELF is a
much more complicated format (admittedly ELF also does have a lot of
toolchain support, mitigating that complexity a lot and you really
should have moved over in the last 25 years).
So it's at least somewhat possible that somebody out there has some
workflow that still involves generating and running a.out executables.
In contrast, it's very unlikely that anybody depends on debugging any
legacy a.out core files. But regardless, I want this phase-out to be
done in two steps, so that we can resurrect a.out support (if needed)
without having to resurrect the core file dumping that is almost
certainly not needed.
Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial
cut at this had missed.
And Alan Cox points out that the a.out binary loader _could_ be done in
user space if somebody wants to, but we might keep just the loader in
the kernel if somebody really wants it, since the loader isn't that big
and has no really odd special cases like the core dumping does.
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Mar 2019 17:09:55 +0000 (09:09 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Add helper for simple skcipher modes.
- Add helper to register multiple templates.
- Set CRYPTO_TFM_NEED_KEY when setkey fails.
- Require neither or both of export/import in shash.
- AEAD decryption test vectors are now generated from encryption
ones.
- New option CONFIG_CRYPTO_MANAGER_EXTRA_TESTS that includes random
fuzzing.
Algorithms:
- Conversions to skcipher and helper for many templates.
- Add more test vectors for nhpoly1305 and adiantum.
Drivers:
- Add crypto4xx prng support.
- Add xcbc/cmac/ecb support in caam.
- Add AES support for Exynos5433 in s5p.
- Remove sha384/sha512 from artpec7 as hardware cannot do partial
hash"
[ There is a merge of the Freescale SoC tree in order to pull in changes
required by patches to the caam/qi2 driver. ]
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (174 commits)
crypto: s5p - add AES support for Exynos5433
dt-bindings: crypto: document Exynos5433 SlimSSS
crypto: crypto4xx - add missing of_node_put after of_device_is_available
crypto: cavium/zip - fix collision with generic cra_driver_name
crypto: af_alg - use struct_size() in sock_kfree_s()
crypto: caam - remove redundant likely/unlikely annotation
crypto: s5p - update iv after AES-CBC op end
crypto: x86/poly1305 - Clear key material from stack in SSE2 variant
crypto: caam - generate hash keys in-place
crypto: caam - fix DMA mapping xcbc key twice
crypto: caam - fix hash context DMA unmap size
hwrng: bcm2835 - fix probe as platform device
crypto: s5p-sss - Use AES_BLOCK_SIZE define instead of number
crypto: stm32 - drop pointless static qualifier in stm32_hash_remove()
crypto: chelsio - Fixed Traffic Stall
crypto: marvell - Remove set but not used variable 'ivsize'
crypto: ccp - Update driver messages to remove some confusion
crypto: adiantum - add 1536 and 4096-byte test vectors
crypto: nhpoly1305 - add a test vector with len % 16 != 0
crypto: arm/aes-ce - update IV after partial final CTR block
...
Linus Torvalds [Tue, 5 Mar 2019 16:26:13 +0000 (08:26 -0800)]
Merge git://git./linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
"Here we go, another merge window full of networking and #ebpf changes:
1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
range without dealing with floods of ARP traffic, from Linus
Lüssing.
2) Throttle buffered multicast packet transmission in mt76, from
Felix Fietkau.
3) Support adaptive interrupt moderation in ice, from Brett Creeley.
4) A lot of struct_size conversions, from Gustavo A. R. Silva.
5) Add peek/push/pop commands to bpftool, as well as bash completion,
from Stanislav Fomichev.
6) Optimize sk_msg_clone(), from Vakul Garg.
7) Add SO_BINDTOIFINDEX, from David Herrmann.
8) Be more conservative with local resends due to local congestion,
from Yuchung Cheng.
9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.
10) Add health buffer support to devlink, from Eran Ben Elisha.
11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.
12) Add statistics to basic packet scheduler filter, from Cong Wang.
13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.
14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.
15) Add 3ad stats to bonding, from Nikolay Aleksandrov.
16) Lots of probing improvements for bpftool, from Quentin Monnet.
17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.
18) Allow #ebpf programs to access gso_segs from skb shared info, from
Eric Dumazet.
19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.
20) Support 22260 iwlwifi devices, from Luca Coelho.
21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.
22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.
23) Add spinlock support to #ebpf, from Alexei Starovoitov.
24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.
25) Add device infomation API to devlink, from Jakub Kicinski.
26) Add new timestamping socket options which are y2038 safe, from
Deepa Dinamani.
27) Add RX checksum offloading for various sh_eth chips, from Sergei
Shtylyov.
28) Flow offload infrastructure, from Pablo Neira Ayuso.
29) Numerous cleanups, improvements, and bug fixes to the PHY layer
and many drivers from Heiner Kallweit.
30) Lots of changes to try and make packet scheduler classifiers run
lockless as much as possible, from Vlad Buslov.
31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.
32) Add concurrency tests to tc-tests infrastructure, from Vlad
Buslov.
33) Add hwmon support to aquantia, from Heiner Kallweit.
34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.
And I would be remiss if I didn't thank the various major networking
subsystem maintainers for integrating much of this work before I even
saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
Johannes Berg, Kalle Valo, and many others. Thank you!"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
net/sched: avoid unused-label warning
net: ignore sysctl_devconf_inherit_init_net without SYSCTL
phy: mdio-mux: fix Kconfig dependencies
net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
selftest/net: Remove duplicate header
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net/mlx5e: Update tx reporter status in case channels were successfully opened
devlink: Add support for direct reporter health state update
devlink: Update reporter state to error even if recover aborted
sctp: call iov_iter_revert() after sending ABORT
team: Free BPF filter when unregistering netdev
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
cxgb4/chtls: Prefix adapter flags with CXGB4
net-sysfs: Switch to bitmap_zalloc()
mellanox: Switch to bitmap_zalloc()
bpf: add test cases for non-pointer sanitiation logic
mlxsw: i2c: Extend initialization by querying resources data
...
Martin Schwidefsky [Mon, 4 Mar 2019 07:25:00 +0000 (08:25 +0100)]
Revert "s390/cpum_cf: Add kernel message exaplanations"
This reverts commit
fb3a0b61e0d4e435016cc91575d051f841791da0.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Tue, 5 Mar 2019 03:33:04 +0000 (19:33 -0800)]
Merge tag 'leds-for-5.1-rc1' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
- finalize previously announced support for initialization of pattern
triggers from Device Tree
- fix for null deref on firmware load failure in leds-lp55xx-common.c
* tag 'leds-for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: lp55xx: fix null deref on firmware load failure
leds: trigger: timer: Add initialization from Device Tree
leds: trigger: oneshot: Add initialization from Device Tree
leds: trigger: pattern: Add pattern initialization from Device Tree
leds: Add helper for getting default pattern from Device Tree
dt-bindings: leds: Add pattern initialization from Device Tree
Linus Torvalds [Tue, 5 Mar 2019 03:29:37 +0000 (19:29 -0800)]
Merge tag 'hwmon-for-v5.1' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- Add support for LM96000, DPS-650AB to existing drivers
- Use permission specific SENSOR[_DEVICE]_ATTR variants in several
drivers
- Replace S_<PERMS> with octal values in several drivers
- Update some license headers
- Various minor fixes and improvements in several drivers
* tag 'hwmon-for-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (89 commits)
dt-bindings: hwmon: Add missing documentation for lm75
hwmon: (ad7418) Add device tree probing
hwmon: (ad741x) Add DT bindings for Analog Devices AD741x
hwmon: (ntc_thermistor) Convert to new hwmon API
hwmon: (pwm-fan) Add optional regulator support
dt-bindings: hwmon: Add optional regulator support to pwm-fan
hwmon: (f71882fg) Mark expected switch fall-through
hwmon: (ad7418) Catch I2C errors
hwmon: (lm85) add support for LM96000 high frequencies
hwmon: (lm85) support the LM96000
dt-bindings: Add LM96000 as a trivial device
hwmon: (lm85) remove freq_map size hardcodes
hwmon: (occ) Fix license headers
hwmon: (via-cputemp) Use permission specific SENSOR[_DEVICE]_ATTR variants
hwmon: (vexpress-hwmon) Use permission specific SENSOR[_DEVICE]_ATTR variants
hwmon: (tmp421) Replace S_<PERMS> with octal values
hwmon: (tmp103) Use permission specific SENSOR[_DEVICE]_ATTR variants
hwmon: (tmp102) Replace S_<PERMS> with octal values
hwmon: (tc74) Use permission specific SENSOR[_DEVICE]_ATTR variants
hwmon: (tc654) Use permission specific SENSOR[_DEVICE]_ATTR variants
...
Linus Torvalds [Tue, 5 Mar 2019 03:23:56 +0000 (19:23 -0800)]
Merge tag 'spi-v5.1' of git://git./linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"A fairly quiet release for SPI, the biggest thing is the conversion to
use GPIO descriptors which is now 90% done but still needs some
stragglers converting.
Summary:
- Support for inter-word delays
- Conversion of the core and most drivers to use GPIO descriptors for
GPIO controlled chip selects
- New drivers for NXP FlexSPI and QuadSPI, SiFive and Spreadtrum"
* tag 'spi-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (104 commits)
spi: sh-msiof: Restrict bits per word to 8/16/24/32 on R-Car Gen2/3
spi: sifive: Remove redundant dev_err call in sifive_spi_probe()
spi: sifive: Remove spi_master_put in sifive_spi_remove()
spi: spi-gpio: fix SPI_CS_HIGH capability
spi: pxa2xx: Setup maximum supported DMA transfer length
spi: sifive: Add driver for the SiFive SPI controller
spi: sifive: Add DT documentation for SiFive SPI controller
spi: sprd: Add a prefix for SPI DMA channel macros
spi: sprd: spi: sprd: Add DMA mode support
dt-bindings: spi: Add the DMA properties for the SPI dma mode
spi: sprd: Add the SPI irq function for the SPI DMA mode
dt-bindings: spi: imx: Add an entry for the i.MX8QM compatible
spi: use gpio[d]_set_value_cansleep for setting chipselect GPIO
spi: gpio: Advertise support for SPI_CS_HIGH
spi: sh-msiof: Replace spi_master by spi_controller
spi: sh-hspi: Replace spi_master by spi_controller
spi: rspi: Replace spi_master by spi_controller
spi: atmel-quadspi: add support for sam9x60 qspi controller
dt-bindings: spi: atmel-quadspi: QuadSPI driver for Microchip SAM9X60
spi: atmel-quadspi: add support for named peripheral clock
...
Linus Torvalds [Tue, 5 Mar 2019 03:20:52 +0000 (19:20 -0800)]
Merge tag 'regulator-v5.1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"The bulk of the standout changes in this release are cleanups, with
the core work being a combination of factoring out common code into
helpers and the completion of the conversion of the core to use GPIO
descriptors.
Summary:
- Addition of helper functions for current limits and conversion of
drivers to use them by Axel Lin.
- Lots and lots of cleanups from Axel Lin.
- Conversion of the core to use GPIO descriptors rather than numbers
by Linus Walleij.
- New drivers for Maxim MAX77650 and ROHM
BD70528"
* tag 'regulator-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (131 commits)
regulator: mc13xxx: Constify regulator_ops variables
regulator: palmas: Constify palmas_smps_ramp_delay array
regulator: wm831x-dcdc: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88090: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88080: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88060: Convert to use regulator_set/get_current_limit_regmap
regulator: max77650: Convert to use regulator_set/get_current_limit_regmap
regulator: lp873x: Convert to use regulator_set/get_current_limit_regmap
regulator: lp872x: Convert to use regulator_set/get_current_limit_regmap
regulator: da9210: Convert to use regulator_set/get_current_limit_regmap
regulator: da9055: Convert to use regulator_set/get_current_limit_regmap
regulator: core: Add set/get_current_limit helpers for regmap users
regulator: Fix comment for csel_reg and csel_mask
regulator: stm32-vrefbuf: add power management support
regulator: 88pm8607: Remove unused fields from struct pm8607_regulator_info
regulator: 88pm8607: Simplify pm8607_list_voltage implementation
regulator: cpcap: Constify omap4_regulators and xoom_regulators
regulator: cpcap: Remove unused vsel_shift from struct cpcap_regulator
dt-bindings: regulator: tps65218: rectify units of LS3
dt-bindings: regulator: add LS2 load switch documentation
...
Linus Torvalds [Tue, 5 Mar 2019 03:16:09 +0000 (19:16 -0800)]
Merge tag 'regmap-v5.1' of git://git./linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"There are only two changes here:
- fix for conflicting attributes on the rbtree node structure
- implementation of main status register support in the interrupt
code which supports chips that have a register to cut down on the
number of per-interrupt status registers that need to be checked
when handling interrupts"
* tag 'regmap-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Remove attribute packed from struct 'regcache_rbtree_node'
regmap: regmap-irq: Add main status register support
Linus Torvalds [Tue, 5 Mar 2019 03:07:02 +0000 (19:07 -0800)]
Merge tag 'mmc-v5.1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Fixup max_discard/trim calculations
- Announce SD specs greater than 4.0
- Add discard support for SD cards
- Don't do retries for CMD6 (SWITCH command)
- Various cleanups and re-structuring
MMC host:
- cqhci:
* Add maintainers for eMMC CQHCI driver
- sdhci:
* Consolidate WP GPIO code
* Add ADMA3 DMA support for V4 enabled host
* Fixup card detect support in pci-o2micro driver
* Add support for CMDQ and SDMMC pads auto-calibration in tegra
driver
* Add DCMD support and CMDQ support, support for i.MX6ULL variant,
fixup HS400 timing issue and add HS400_ES support for i.MX8QXP
to esdhc-imx driver
* Avoid CRC errors by adjusting settings to speed mode and fixup
card initialization for high speed mode in renesas_sdhi
* Fixup timeout settings for omap
* Enable 8 bits bus-width support in atmel-mci
* Convert some legacy code in jz4740 driver to use modern APIs
* Send a CMD12 to clear DPSM at errors for STM32 sdmmc mmci
driver"
* tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (69 commits)
mmc:fix a bug when max_discard is 0
mmc: core: Add a debug print when the card may have been replaced
mmc: core: Add sd discard timeout
mmc: core: Add discard support to sd
mmc: sdhci-esdhc-imx: clear the HALT bit when enable CQE
mmc: core: do not retry CMD6 in __mmc_switch()
mmc: core: Convert mmc_align_data_size() into an SDIO specific function
mmc: core: Move mmc_of_parse_voltage() to host.c
mmc: core: Convert mmc_regulator_get_ocrmask() to static
mmc: core: Move regulator helpers to separate file
mmc: of_mmc_spi: Convert to mmc_of_parse_voltage()
mmc: core: Drop retries as in-parameter to mmc_wait_for_app_cmd()
mmc: core: Convert mmc_wait_for_app_cmd() to static
mmc: renesas_sdhi: Change HW adjustment register according to speed mode
mmc: mmci: Send a CMD12 to clear the DPSM at errors
mmc: sdhci-xenon: Fixup already marked switch fall-through
mmc: sdhci-tegra: drop ->get_ro() implementation
mmc: sdhci-omap: drop ->get_ro() implementation
mmc: sdhci: use WP GPIO in sdhci_check_ro()
mmc: wmt-sdmmc: Drop unused include
...
Linus Torvalds [Tue, 5 Mar 2019 03:05:02 +0000 (19:05 -0800)]
Merge tag 'i3c/for-5.1' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Boris Brezillon:
- Add a /* fall-through */ comment in the dw-i3c-master driver
- Update the I3C entries in MAINTAINERS to add an IRC chan
* tag 'i3c/for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: dw-i3c-master: mark expected switch fall-through
MAINTAINERS: Add an IRC channel for the I3C subsystem
Linus Torvalds [Tue, 5 Mar 2019 02:59:37 +0000 (18:59 -0800)]
Merge tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtd
Pull MTD updates from Boris Brezillon:
"Core MTD changes:
- Use struct_size() where appropriate
- mtd_{read,write}() as wrappers around mtd_{read,write}_oob()
- Fix misuse of PTR_ERR() in docg3
- Coding style improvements in mtdcore.c
SPI NOR changes:
Core changes:
- Add support of octal mode I/O transfer
- Add a bunch of SPI NOR entries to the flash_info table
SPI NOR controller driver changes:
- cadence-quadspi:
* Add support for Octal SPI controller
* write upto 8-bytes data in STIG mode
- mtk-quadspi:
* rename config to a common one
* add SNOR_HWCAPS_READ to spi_nor_hwcaps mask
- Add Tudor as SPI-NOR co-maintainer
NAND changes:
NAND core changes:
- Fourth batch of fixes/cleanup to the raw NAND core impacting
various controller drivers (Sunxi, Marvell, MTK, TMIO, OMAP2).
- Check the return code of nand_reset() and nand_readid_op().
- Remove ->legacy.erase and single_erase().
- Simplify the locking.
- Several implicit fall through annotations.
Raw NAND controllers drivers changes:
- Fix various possible object reference leaks (MTK, JZ4780, Atmel)
- ST:
* Add support for STM32 FMC2 NAND flash controller
- Meson:
* Add support for Amlogic NAND flash controller
- Denali:
* Several cleanup patches
- Sunxi:
* Several cleanup patches
- FSMC:
* Disable NAND on remove()
* Reset NAND timings on resume()
SPI-NAND drivers changes:
- Toshiba:
* Add support for all Toshiba products.
- Macronix:
* Fix ECC status read.
- Gigadevice:
* Add support for GD5F1GQ4UExxG"
* tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtd: (64 commits)
mtd: spi-nor: Fix wrong abbreviation HWCPAS
mtd: spi-nor: cadence-quadspi: fix spelling mistake: "Couldnt't" -> "Couldn't"
mtd: spi-nor: Add support for en25qh64
mtd: spi-nor: Add support for MX25V8035F
mtd: spi-nor: Add support for EN25Q80A
mtd: spi-nor: cadence-quadspi: Add support for Octal SPI controller
dt-bindings: cadence-quadspi: Add new compatible for AM654 SoC
mtd: spi-nor: split s25fl128s into s25fl128s0 and s25fl128s1
mtd: spi-nor: cadence-quadspi: write upto 8-bytes data in STIG mode
mtd: spi-nor: Add support for mx25u3235f
mtd: rawnand: denali_dt: remove single anonymous clock support
mtd: rawnand: mtk: fix possible object reference leak
mtd: rawnand: jz4780: fix possible object reference leak
mtd: rawnand: atmel: fix possible object reference leak
mtd: rawnand: fsmc: Disable NAND on remove()
mtd: rawnand: fsmc: Reset NAND timings on resume()
mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG
mtd: rawnand: denali: remove unused dma_addr field from denali_nand_info
mtd: rawnand: denali: remove unused function argument 'raw'
mtd: rawnand: denali: remove unneeded denali_reset_irq() call
...
Linus Torvalds [Tue, 5 Mar 2019 02:56:36 +0000 (18:56 -0800)]
Merge tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Switch mdev to generic UUID API (Andy Shevchenko)
- Fixup platform reset include paths (Masahiro Yamada)
- Fix usage of MINORMASK (Chengguang Xu)
- Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy)
- Restore device state after PM reset (Alex Williamson)
- Ensure memory translation enabled for PCI ROM access (Eric Auger)
* tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio:
vfio_pci: Enable memory accesses before calling pci_map_rom
vfio/pci: Restore device state on PM transition
vfio/spapr_tce: Skip unsetting already unset table
samples/vfio-mdev/mtty: expand minor range when registering chrdev region
samples/vfio-mdev/mdpy: expand minor range when registering chrdev region
samples/vfio-mdev/mbochs: expand minor range when registering chrdev region
vfio: expand minor range when registering chrdev region
vfio: platform: reset: fix up include directives to remove ccflags-y
vfio-mdev: Switch to use new generic UUID API
Slavomir Kaslev [Thu, 7 Feb 2019 15:45:19 +0000 (17:45 +0200)]
fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes
The current implementation of splice() and tee() ignores O_NONBLOCK set
on pipe file descriptors and checks only the SPLICE_F_NONBLOCK flag for
blocking on pipe arguments. This is inconsistent since splice()-ing
from/to non-pipe file descriptors does take O_NONBLOCK into
consideration.
Fix this by promoting O_NONBLOCK, when set on a pipe, to
SPLICE_F_NONBLOCK.
Some context for how the current implementation of splice() leads to
inconsistent behavior. In the ongoing work[1] to add VM tracing
capability to trace-cmd we stream tracing data over named FIFOs or
vsockets from guests back to the host.
When we receive SIGINT from user to stop tracing, we set O_NONBLOCK on
the input file descriptor and set SPLICE_F_NONBLOCK for the next call to
splice(). If splice() was blocked waiting on data from the input FIFO,
after SIGINT splice() restarts with the same arguments (no
SPLICE_F_NONBLOCK) and blocks again instead of returning -EAGAIN when no
data is available.
This differs from the splice() behavior when reading from a vsocket or
when we're doing a traditional read()/write() loop (trace-cmd's
--nosplice argument).
With this patch applied we get the same behavior in all situations after
setting O_NONBLOCK which also matches the behavior of doing a
read()/write() loop instead of splice().
This change does have potential of breaking users who don't expect
EAGAIN from splice() when SPLICE_F_NONBLOCK is not set. OTOH programs
that set O_NONBLOCK and don't anticipate EAGAIN are arguably buggy[2].
[1] https://github.com/skaslev/trace-cmd/tree/vsock
[2] https://github.com/torvalds/linux/blob/
d47e3da1759230e394096fd742aad423c291ba48/fs/read_write.c#L1425
Signed-off-by: Slavomir Kaslev <kaslevs@vmware.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Mon, 4 Mar 2019 21:26:15 +0000 (13:26 -0800)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Mon, 4 Mar 2019 21:24:27 +0000 (13:24 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Assorted fixes that sat in -next for a while, all over the place"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: Fix locking in aio_poll()
exec: Fix mem leak in kernel_read_file
copy_mount_string: Limit string length to PATH_MAX
cgroup: saner refcounting for cgroup_root
fix cgroup_do_mount() handling of failure exits
Arnd Bergmann [Mon, 4 Mar 2019 20:40:32 +0000 (21:40 +0100)]
net/sched: avoid unused-label warning
The label is only used from inside the #ifdef and should be
hidden the same way, to avoid this warning:
net/sched/act_tunnel_key.c: In function 'tunnel_key_init':
net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label]
release_tun_meta:
Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 4 Mar 2019 20:38:03 +0000 (21:38 +0100)]
net: ignore sysctl_devconf_inherit_init_net without SYSCTL
When CONFIG_SYSCTL is turned off, we get a link failure for
the newly introduced tuning knob.
net/ipv6/addrconf.o: In function `addrconf_init_net':
addrconf.c:(.text+0x31dc): undefined reference to `sysctl_devconf_inherit_init_net'
Add an IS_ENABLED() check to fall back to the default behavior
(sysctl_devconf_inherit_init_net=0) here.
Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 4 Mar 2019 20:35:10 +0000 (21:35 +0100)]
phy: mdio-mux: fix Kconfig dependencies
MDIO_BUS_MUX can only be selected if OF_MDIO is already turned on:
WARNING: unmet direct dependencies detected for MDIO_BUS_MUX
Depends on [n]: NETDEVICES [=y] && MDIO_BUS [=m] && OF_MDIO [=n]
Selected by [m]:
- MDIO_BUS_MUX_MULTIPLEXER [=m] && NETDEVICES [=y] && MDIO_BUS [=m] && OF [=y]
Fixes: 7865ad6551c9 ("drivers: net: phy: mdio-mux: Add support for Generic Mux controls")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 4 Mar 2019 18:50:40 +0000 (19:50 +0100)]
net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
As can be seen from the usage of the return value, we should use
phy_modify_mmd_changed() here.
Fixes: 9a5dc8af4416 ("net: phy: add genphy_c45_an_config_aneg")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 4 Mar 2019 18:39:03 +0000 (19:39 +0100)]
net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
In the original patch I missed to add mv88e6xxx_ports_cmode_init()
to the second probe function, the one for the new DSA framework.
Fixes: ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode")
Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Souptick Joarder [Mon, 4 Mar 2019 18:20:48 +0000 (23:50 +0530)]
selftest/net: Remove duplicate header
Remove duplicate header which is included twice.
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kai-Heng Feng [Mon, 4 Mar 2019 07:00:03 +0000 (15:00 +0800)]
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
Some sky2 chips fire IRQ after S3, before the driver is fully resumed:
[ 686.804877] do_IRQ: 1.37 No irq handler for vector
This is likely a platform bug that device isn't fully quiesced during
S3. Use MSI-X, maskable MSI or INTx can prevent this issue from
happening.
Since MSI-X and maskable MSI are not supported by this device, fallback
to use INTx on affected platforms.
BugLink: https://bugs.launchpad.net/bugs/1807259
BugLink: https://bugs.launchpad.net/bugs/1809843
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 19:00:43 +0000 (11:00 -0800)]
Merge branch 'Devlink-health-updates'
Eran Ben Elisha says:
====================
Devlink health updates
This patchset includes a fix [patch 01] to the devlink health state update, in
case recover was aborted.
In addition, it includes a small enhancement to the infrastructure in order to
allow direct state update in run-time, and use it from mlx5e tx reporter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Sun, 3 Mar 2019 08:57:31 +0000 (10:57 +0200)]
net/mlx5e: Update tx reporter status in case channels were successfully opened
Once channels were successfully opened, update tx reporter health state to
healthy. This is needed for the following scenario:
- SQ has an un-recovered error reported to the devlink health, resulting tx
reporter state to be error.
- Current channels (including this SQ) are closed
- New channels are opened
After that flow, the original error was "solved", and tx reporter state
should be healthy. However, as it was resolved as a side effect, and not
via tx reporter recover method, driver needs to inform devlink health
about it.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Sun, 3 Mar 2019 08:57:30 +0000 (10:57 +0200)]
devlink: Add support for direct reporter health state update
It is possible that a reporter state will be updated due to a recover flow
which is not triggered by a devlink health related operation, but as a side
effect of some other operation in the system.
Expose devlink health API for a direct update of a reporter status.
Move devlink_health_reporter_state enum definition to devlink.h so it could
be used from drivers as a parameter of devlink_health_reporter_state_update.
In addition, add trace_devlink_health_reporter_state_update to provide user
notification for reporter state change.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Sun, 3 Mar 2019 08:57:29 +0000 (10:57 +0200)]
devlink: Update reporter state to error even if recover aborted
If devlink_health_report() aborted the recover flow due to grace period checker,
it left the reporter status as DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, which is
a bug. Fix that by always setting the reporter state to
DEVLINK_HEALTH_REPORTER_STATE_ERROR prior to running the checker mentioned above.
In addition, save the previous health_state in a temporary variable, then use
it in the abort check comparison instead of using reporter->health_state which
might be already changed.
Fixes: c8e1da0bf923 ("devlink: Add health report functionality")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 3 Mar 2019 08:50:26 +0000 (16:50 +0800)]
sctp: call iov_iter_revert() after sending ABORT
The user msg is also copied to the abort packet when doing SCTP_ABORT in
sctp_sendmsg_check_sflags(). When SCTP_SENDALL is set, iov_iter_revert()
should have been called for sending abort on the next asoc with copying
this msg. Otherwise, memcpy_from_msg() in sctp_make_abort_user() will
fail and return error.
Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 3 Mar 2019 07:35:51 +0000 (07:35 +0000)]
team: Free BPF filter when unregistering netdev
When team is used in loadbalance mode a BPF filter can be used to
provide a hash which will determine the Tx port.
When the netdev is later unregistered the filter is not freed which
results in memory leaks [1].
Fix by freeing the program and the corresponding filter when
unregistering the netdev.
[1]
unreferenced object 0xffff8881dbc47cc8 (size 16):
comm "teamd", pid 3068, jiffies
4294997779 (age 438.247s)
hex dump (first 16 bytes):
a3 00 6b 6b 6b 6b 6b 6b 88 a5 82 e1 81 88 ff ff ..kkkkkk........
backtrace:
[<
000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0
[<
00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080
[<
00000000610ef838>] genl_rcv_msg+0xca/0x170
[<
00000000a281df93>] netlink_rcv_skb+0x132/0x380
[<
000000004d9448a2>] genl_rcv+0x29/0x40
[<
000000000321b2f4>] netlink_unicast+0x4c0/0x690
[<
000000008c25dffb>] netlink_sendmsg+0x929/0xe10
[<
00000000068298c5>] sock_sendmsg+0xc8/0x110
[<
0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0
[<
00000000663ae29d>] __sys_sendmsg+0xf7/0x250
[<
0000000027c5f11a>] do_syscall_64+0x14d/0x610
[<
000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<
00000000e23197e2>] 0xffffffffffffffff
unreferenced object 0xffff8881e182a588 (size 2048):
comm "teamd", pid 3068, jiffies
4294997780 (age 438.247s)
hex dump (first 32 bytes):
20 00 00 00 02 00 00 00 30 00 00 00 28 f0 ff ff .......0...(...
07 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 ........(.......
backtrace:
[<
000000002daf01fb>] lb_bpf_func_set+0x45c/0x6d0
[<
000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0
[<
00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080
[<
00000000610ef838>] genl_rcv_msg+0xca/0x170
[<
00000000a281df93>] netlink_rcv_skb+0x132/0x380
[<
000000004d9448a2>] genl_rcv+0x29/0x40
[<
000000000321b2f4>] netlink_unicast+0x4c0/0x690
[<
000000008c25dffb>] netlink_sendmsg+0x929/0xe10
[<
00000000068298c5>] sock_sendmsg+0xc8/0x110
[<
0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0
[<
00000000663ae29d>] __sys_sendmsg+0xf7/0x250
[<
0000000027c5f11a>] do_syscall_64+0x14d/0x610
[<
000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<
00000000e23197e2>] 0xffffffffffffffff
Fixes: 01d7f30a9f96 ("team: add loadbalance mode")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 3 Mar 2019 07:34:57 +0000 (07:34 +0000)]
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
Similar to commit
44f49dd8b5a6 ("ipmr: fix possible race resulting from
improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
assume preemption is disabled when incrementing the counter and
accessing a per-CPU variable.
Preemption can be enabled when we add a route in process context that
corresponds to packets stored in the unresolved queue, which are then
forwarded using this route [1].
Fix this by using IP6_INC_STATS() which takes care of disabling
preemption on architectures where it is needed.
[1]
[ 157.451447] BUG: using __this_cpu_add() in preemptible [
00000000] code: smcrouted/2314
[ 157.460409] caller is ip6mr_forward2+0x73e/0x10e0
[ 157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted
5.0.0-rc7-custom-03635-g22f2712113f1 #1336
[ 157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[ 157.460461] Call Trace:
[ 157.460486] dump_stack+0xf9/0x1be
[ 157.460553] check_preemption_disabled+0x1d6/0x200
[ 157.460576] ip6mr_forward2+0x73e/0x10e0
[ 157.460705] ip6_mr_forward+0x9a0/0x1510
[ 157.460771] ip6mr_mfc_add+0x16b3/0x1e00
[ 157.461155] ip6_mroute_setsockopt+0x3cb/0x13c0
[ 157.461384] do_ipv6_setsockopt.isra.8+0x348/0x4060
[ 157.462013] ipv6_setsockopt+0x90/0x110
[ 157.462036] rawv6_setsockopt+0x4a/0x120
[ 157.462058] __sys_setsockopt+0x16b/0x340
[ 157.462198] __x64_sys_setsockopt+0xbf/0x160
[ 157.462220] do_syscall_64+0x14d/0x610
[ 157.462349] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 0912ea38de61 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Sat, 2 Mar 2019 21:20:43 +0000 (15:20 -0600)]
isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
Allocating memory via kzalloc for phi may fail and causes a
NULL pointer dereference. This patch avoids such a scenario.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 1 Mar 2019 19:41:00 +0000 (20:41 +0100)]
net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
If an external PHY is connected via SGMII and uses in-band signalling
then the auto-negotiated values aren't propagated to the port,
resulting in a broken link. See discussion in [0]. This patch adds
this propagation. We need to call mv88e6xxx_port_setup_mac(),
therefore export it from chip.c.
Successfully tested on a ZII DTU with
88E6390 switch and an
Aquantia AQCS109 PHY connected via SGMII to port 9.
[0] https://marc.info/?t=
155130287200001&r=1&w=2
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 4 Mar 2019 18:39:05 +0000 (10:39 -0800)]
get rid of legacy 'get_ds()' function
Every in-kernel use of this function defined it to KERNEL_DS (either as
an actual define, or as an inline function). It's an entirely
historical artifact, and long long long ago used to actually read the
segment selector valueof '%ds' on x86.
Which in the kernel is always KERNEL_DS.
Inspired by a patch from Jann Horn that just did this for a very small
subset of users (the ones in fs/), along with Al who suggested a script.
I then just took it to the logical extreme and removed all the remaining
gunk.
Roughly scripted with
git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'
plus manual fixups to remove a few unusual usage patterns, the couple of
inline function cases and to fix up a comment that had become stale.
The 'get_ds()' function remains in an x86 kvm selftest, since in user
space it actually does something relevant.
Inspired-by: Jann Horn <jannh@google.com>
Inspired-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Mar 2019 22:23:33 +0000 (14:23 -0800)]
aio: simplify - and fix - fget/fput for io_submit()
Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:
"In more details - normally IOCB_CMD_POLL handling looks so:
1) io_submit(2) allocates aio_kiocb instance and passes it to
aio_poll()
2) aio_poll() resolves the descriptor to struct file by req->file =
fget(iocb->aio_fildes)
3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
aio_kiocb to 2 (bumps by 1, that is).
4) aio_poll() calls vfs_poll(). After sanity checks (basically,
"poll_wait() had been called and only once") it locks the queue.
That's what the extra reference to iocb had been for - we know we
can safely access it.
5) With queue locked, we check if ->woken has already been set to
true (by aio_poll_wake()) and, if it had been, we unlock the
queue, drop a reference to aio_kiocb and bugger off - at that
point it's a responsibility to aio_poll_wake() and the stuff
called/scheduled by it. That code will drop the reference to file
in req->file, along with the other reference to our aio_kiocb.
6) otherwise, we see whether we need to wait. If we do, we unlock the
queue, drop one reference to aio_kiocb and go away - eventual
wakeup (or cancel) will deal with the reference to file and with
the other reference to aio_kiocb
7) otherwise we remove ourselves from waitqueue (still under the
queue lock), so that wakeup won't get us. No async activity will
be happening, so we can safely drop req->file and iocb ourselves.
If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
won't get freed under us, so we can do all the checks and locking
safely. And we don't touch ->file if we detect that case.
However, vfs_poll() most certainly *does* touch the file it had been
given. So wakeup coming while we are still in ->poll() might end up
doing fput() on that file. That case is not too rare, and usually we
are saved by the still present reference from descriptor table - that
fput() is not the final one.
But if another thread closes that descriptor right after our fget()
and wakeup does happen before ->poll() returns, we are in trouble -
final fput() done while we are in the middle of a method:
Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjun Vynipadath [Mon, 4 Mar 2019 12:13:02 +0000 (17:43 +0530)]
cxgb4/chtls: Prefix adapter flags with CXGB4
Some of these macros were conflicting with global namespace,
hence prefixing them with CXGB4.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 4 Mar 2019 09:48:56 +0000 (11:48 +0200)]
net-sysfs: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 4 Mar 2019 08:57:00 +0000 (10:57 +0200)]
mellanox: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 18:14:31 +0000 (10:14 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-03-04
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
AF_XDP applications by offering higher-level APIs that hide many
of the details of the AF_XDP uapi. Sample programs are converted
over to this new interface as well, from Magnus.
2) Introduce a new cant_sleep() macro for annotation of functions
that cannot sleep and use it in BPF_PROG_RUN() to assert that
BPF programs run under preemption disabled context, from Peter.
3) Introduce per BPF prog stats in order to monitor the usage
of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
knob where monitoring tools can make use of this to efficiently
determine the average cost of programs, from Alexei.
4) Split up BPF selftest's test_progs similarly as we already
did with test_verifier. This allows to further reduce merge
conflicts in future and to get more structure into our
quickly growing BPF selftest suite, from Stanislav.
5) Fix a bug in BTF's dedup algorithm which can cause an infinite
loop in some circumstances; also various BPF doc fixes and
improvements, from Andrii.
6) Various BPF sample cleanups and migration to libbpf in order
to further isolate the old sample loader code (so we can get
rid of it at some point), from Jakub.
7) Add a new BPF helper for BPF cgroup skb progs that allows
to set ECN CE code point and a Host Bandwidth Manager (HBM)
sample program for limiting the bandwidth used by v2 cgroups,
from Lawrence.
8) Enable write access to skb->queue_mapping from tc BPF egress
programs in order to let BPF pick TX queue, from Jesper.
9) Fix a bug in BPF spinlock handling for map-in-map which did
not propagate spin_lock_off to the meta map, from Yonghong.
10) Fix a bug in the new per-CPU BPF prog counters to properly
initialize stats for each CPU, from Eric.
11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
from Willem.
12) Fix various BPF samples bugs in XDP and tracing progs,
from Toke, Daniel and Yonghong.
13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
enforces it now everywhere, from Anders.
14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
get error handling working, from Dan.
15) Fix bpftool documentation and auto-completion with regards
to stream_{verdict,parser} attach types, from Alban.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 26 Feb 2019 17:16:04 +0000 (09:16 -0800)]
x86-64: add warning for non-canonical user access address dereferences
This adds a warning (once) for any kernel dereference that has a user
exception handler, but accesses a non-canonical address. It basically
is a simpler - and more limited - version of commit
9da3f2b74054
("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that
got reverted.
Note that unlike that original commit, this only causes a warning,
because there are real situations where we currently can do this
(notably speculative argument fetching for uprobes etc). Also, unlike
that original commit, this _only_ triggers for #GP accesses, so the
cases of valid kernel pointers that cross into a non-mapped page aren't
affected.
The intent of this is two-fold:
- the uprobe/tracing accesses really do need to be more careful. In
particular, from a portability standpoint it's just wrong to think
that "a pointer is a pointer", and use the same logic for any random
pointer value you find on the stack. It may _work_ on x86-64, but it
doesn't necessarily work on other architectures (where the same
pointer value can be either a kernel pointer _or_ a user pointer, and
you really need to be much more careful in how you try to access it)
The warning can hopefully end up being a reminder that just any
random pointer access won't do.
- Kees in particular wanted a way to actually report invalid uses of
wild pointers to user space accessors, instead of just silently
failing them. Automated fuzzers want a way to get reports if the
kernel ever uses invalid values that the fuzzer fed it.
The non-canonical address range is a fair chunk of the address space,
and with this you can teach syzkaller to feed in invalid pointer
values and find cases where we do not properly validate user
addresses (possibly due to bad uses of "set_fs()").
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Mon, 4 Mar 2019 15:32:51 +0000 (15:32 +0000)]
Merge branch 'spi-5.1' into spi-next
Mark Brown [Mon, 4 Mar 2019 15:32:49 +0000 (15:32 +0000)]
Merge branch 'spi-5.0' into spi-linus
Mark Brown [Mon, 4 Mar 2019 15:32:43 +0000 (15:32 +0000)]
Merge branch 'regulator-5.1' into regulator-next
Mark Brown [Mon, 4 Mar 2019 15:32:41 +0000 (15:32 +0000)]
Merge branch 'regulator-5.0' into regulator-linus
Daniel Borkmann [Fri, 1 Mar 2019 21:08:21 +0000 (22:08 +0100)]
bpf: add test cases for non-pointer sanitiation logic
Add two additional tests for further asserting the
BPF_ALU_NON_POINTER logic with cases that were missed
previously.
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Arthur Fabre <afabre@cloudflare.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Mon, 4 Mar 2019 06:23:00 +0000 (22:23 -0800)]
Merge branch 'mlxsw-minimal-Add-ethtool-and-resource-query-support'
Ido Schimmel says:
====================
mlxsw: minimal: Add ethtool and resource query support
Vadim says:
The minimal driver is chip independent and uses I2C bus for chip access.
Its purpose is to support chassis management on systems equipped with
Mellanox switch ASICs. For example, from a BMC (Board Management
Controller) device.
Patches #1-#3 add ethtool support to the minimal driver so that QSFP/SFP
module info could be retrieved by the driver. This is done by exposing a
dummy netdev for each front panel port and implementing the required
ethtool operations.
Patches #4-#8 add resource query support. This allows the driver to
query the firmware about values of certain resources (e.g., maximum
number of ports). It is required on systems where the maximum number of
ports is larger than the hard coded default (64).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:16 +0000 (09:12 +0000)]
mlxsw: i2c: Extend initialization by querying resources data
Extend initialization flow by query requests for chip resources data in
order to obtain chip's specific capabilities, like the number of ports.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:15 +0000 (09:12 +0000)]
mlxsw: i2c: Extend input parameters list of command API
Extend input parameters list of command API in mlxsw_i2c_cmd() in order
to support initialization commands. Up until now, only access commands
were supported by I2C driver.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:14 +0000 (09:12 +0000)]
mlxsw: i2c: Modify input parameter name in initialization API
Change input parameter name "resource" to "res" in mlxsw_i2c_init() in
order to align it with mlxsw_pci_init().
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:12 +0000 (09:12 +0000)]
mlxsw: i2c: Fix comment misspelling
Fix comment for mlxsw_i2c_write_cmd().
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:11 +0000 (09:12 +0000)]
mlxsw: core: Move resource query API to common location
Move mlxsw_pci_resources_query() to a common location to allow reuse by
the different drivers and over all the supported physical buses. Rename
it to mlxsw_core_resources_query().
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:10 +0000 (09:12 +0000)]
mlxsw: minimal: Add ethtool support
The minimal driver is chip independent and uses I2C bus for chip access.
Its purpose is to support chassis management on systems equipped with
Mellanox switch ASICs. For example from BMC (Board Management
Controller) device.
Expose a dummy netdev for each front panel port and implement basic
ethtool operations to obtain QSFP/SFP module info through ethtool.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:09 +0000 (09:12 +0000)]
mlxsw: minimal: Make structures and variables names shorter
Replace "mlxsw_minimal" by "mlxsw_m" in order to improve code
readability.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sun, 3 Mar 2019 09:12:08 +0000 (09:12 +0000)]
mlxsw: core: Move ethtool module callbacks to a common location
Move the implementation of ethtool module callbacks - .get_module_info()
and .get_module_eeprom() - to a common location to allow reuse by the
different mlxsw drivers.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 06:10:16 +0000 (22:10 -0800)]
Merge branch 'tls-Fix-issues-in-tls_device'
Boris Pismenny says:
====================
tls: Fix issues in tls_device
This series fixes issues encountered in tls_device code paths,
which were introduced recently.
Additionally, this series includes a fix for tls software only receive flow,
which causes corruption of payload received by user space applications.
This series was tested using the OpenSSL integration of KTLS -
https://github.com/mellan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Boris Pismenny [Wed, 27 Feb 2019 15:38:06 +0000 (17:38 +0200)]
tls: Fix tls_device receive
Currently, the receive function fails to handle records already
decrypted by the device due to the commit mentioned below.
This commit advances the TLS record sequence number and prepares the context
to handle the next record.
Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 27 Feb 2019 15:38:05 +0000 (17:38 +0200)]
tls: Fix mixing between async capable and async
Today, tls_sw_recvmsg is capable of using asynchronous mode to handle
application data TLS records. Moreover, it assumes that if the cipher
can be handled asynchronously, then all packets will be processed
asynchronously.
However, this assumption is not always true. Specifically, for AES-GCM
in TLS1.2, it causes data corruption, and breaks user applications.
This patch fixes this problem by separating the async capability from
the decryption operation result.
Fixes: c0ab4732d4c6 ("net/tls: Do not use async crypto for non-data records")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Boris Pismenny [Wed, 27 Feb 2019 15:38:04 +0000 (17:38 +0200)]
tls: Fix write space handling
TLS device cannot use the sw context. This patch returns the original
tls device write space handler and moves the sw/device specific portions
to the relevant files.
Also, we remove the write_space call for the tls_sw flow, because it
handles partial records in its delayed tx work handler.
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Boris Pismenny [Wed, 27 Feb 2019 15:38:03 +0000 (17:38 +0200)]
tls: Fix tls_device handling of partial records
Cleanup the handling of partial records while fixing a bug where the
tls_push_pending_closed_record function is using the software tls
context instead of the hardware context.
The bug resulted in the following crash:
[ 88.791229] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[ 88.793271] #PF error: [normal kernel read fault]
[ 88.794449] PGD
800000022a426067 P4D
800000022a426067 PUD
22a156067 PMD 0
[ 88.795958] Oops: 0000 [#1] SMP PTI
[ 88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
[ 88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
[ 88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
[ 88.805179] RSP: 0018:
ffffbd888186fca8 EFLAGS:
00010213
[ 88.806458] RAX:
ffff9af1ed657c98 RBX:
ffff9af1e88a1980 RCX:
0000000000000000
[ 88.808050] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff9af1e88a1980
[ 88.809724] RBP:
ffff9af1e88a1980 R08:
0000000000000017 R09:
ffff9af1ebeeb700
[ 88.811294] R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000000
[ 88.812917] R13:
ffff9af1e88a1980 R14:
ffff9af1ec13f800 R15:
0000000000000000
[ 88.814506] FS:
00007fcad2240740(0000) GS:
ffff9af1f7880000(0000) knlGS:
0000000000000000
[ 88.816337] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 88.817717] CR2:
0000000000000000 CR3:
0000000228b3e000 CR4:
00000000001406e0
[ 88.819328] Call Trace:
[ 88.820123] tls_push_data+0x628/0x6a0 [tls]
[ 88.821283] ? remove_wait_queue+0x20/0x60
[ 88.822383] ? n_tty_read+0x683/0x910
[ 88.823363] tls_device_sendmsg+0x53/0xa0 [tls]
[ 88.824505] sock_sendmsg+0x36/0x50
[ 88.825492] sock_write_iter+0x87/0x100
[ 88.826521] __vfs_write+0x127/0x1b0
[ 88.827499] vfs_write+0xad/0x1b0
[ 88.828454] ksys_write+0x52/0xc0
[ 88.829378] do_syscall_64+0x5b/0x180
[ 88.830369] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 88.831603] RIP: 0033:0x7fcad1451680
[ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[ 1248.472564] #PF error: [normal kernel read fault]
[ 1248.473790] PGD 0 P4D 0
[ 1248.474642] Oops: 0000 [#1] SMP PTI
[ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G OE 5.0.0-rc4+ #3
[ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
[ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
[ 1248.484825] RSP: 0018:
ffffaa0a41543c08 EFLAGS:
00010213
[ 1248.486154] RAX:
ffff955a2755dc98 RBX:
ffff955a36031980 RCX:
0000000000000006
[ 1248.487855] RDX:
0000000000000000 RSI:
000000000000002b RDI:
0000000000000286
[ 1248.489524] RBP:
ffff955a36031980 R08:
0000000000000000 R09:
00000000000002b1
[ 1248.491394] R10:
0000000000000003 R11:
00000000ad55ad55 R12:
0000000000000000
[ 1248.493162] R13:
0000000000000000 R14:
ffff955a2abe6c00 R15:
0000000000000000
[ 1248.494923] FS:
0000000000000000(0000) GS:
ffff955a378c0000(0000) knlGS:
0000000000000000
[ 1248.496847] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1248.498357] CR2:
0000000000000000 CR3:
000000020c40e000 CR4:
00000000001406e0
[ 1248.500136] Call Trace:
[ 1248.500998] ? tcp_check_oom+0xd0/0xd0
[ 1248.502106] tls_sk_proto_close+0x127/0x1e0 [tls]
[ 1248.503411] inet_release+0x3c/0x60
[ 1248.504530] __sock_release+0x3d/0xb0
[ 1248.505611] sock_close+0x11/0x20
[ 1248.506612] __fput+0xb4/0x220
[ 1248.507559] task_work_run+0x88/0xa0
[ 1248.508617] do_exit+0x2cb/0xbc0
[ 1248.509597] ? core_sys_select+0x17a/0x280
[ 1248.510740] do_group_exit+0x39/0xb0
[ 1248.511789] get_signal+0x1d0/0x630
[ 1248.512823] do_signal+0x36/0x620
[ 1248.513822] exit_to_usermode_loop+0x5c/0xc6
[ 1248.515003] do_syscall_64+0x157/0x180
[ 1248.516094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1248.517456] RIP: 0033:0x7fb398bd3f53
[ 1248.518537] Code: Bad RIP value.
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 05:48:06 +0000 (21:48 -0800)]
Merge branch 'net-phy-clean-up-the-old-gen10g-functions'
Heiner Kallweit says:
====================
net: phy: clean up the old gen10g functions
The old gen10g_ functions are mainly stubs and have been superseded
by genphy_c45_ equivalents. So lets remove / hide the old functions
as far as possible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 16:13:11 +0000 (17:13 +0100)]
net: phy: remove gen10g_no_soft_reset
genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops,
one is enough.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 16:15:56 +0000 (17:15 +0100)]
net: phy: don't export gen10g_read_status
gen10g_read_status is deprecated, therefore stop exporting it.
We don't want to encourage anybody to use it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 16:11:40 +0000 (17:11 +0100)]
net: phy: remove gen10g_config_init
ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported
and advertising bitmap because it's part of PHY_10GBIT_FEATURES.
And all users of gen10g_config_init use PHY_10GBIT_FEATURES.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 16:10:36 +0000 (17:10 +0100)]
net: phy: remove gen10g_suspend and gen10g_resume
phy_suspend() and phy_resume() are no-ops anyway if no callback is
defined. Therefore we don't need these stubs.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 16:10:00 +0000 (17:10 +0100)]
net: phy: use genphy_c45_aneg_done in genphy_aneg_done
Now that we have it let's use genphy_c45_aneg_done() in phy_aneg_done().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kristian Evensen [Sat, 2 Mar 2019 12:32:26 +0000 (13:32 +0100)]
qmi_wwan: Add support for Quectel EG12/EM12
Quectel EG12 (module)/EM12 (M.2 card) is a Cat. 12 LTE modem. The modem
behaves in the same way as the EP06, so the "set DTR"-quirk must be
applied and the diagnostic-interface check performed. Since the
diagnostic-check now applies to more modems, I have renamed the function
from quectel_ep06_diag_detected() to quectel_diag_detected().
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 2 Mar 2019 09:06:05 +0000 (10:06 +0100)]
net: dsa: mv8e6xxx: fix number of internal PHYs for 88E6x90 family
Ports 9 and 10 don't have internal PHY's but are (dependent on the
version) SERDES/SGMII/XAUI/RXAUI ports.
v2:
- fix it for all 88E6x90 family members
Fixes: bc3931557d1d ("net: dsa: mv88e6xxx: Add number of internal PHYs")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sat, 2 Mar 2019 02:34:55 +0000 (10:34 +0800)]
net-sysfs: Fix mem leak in netdev_register_kobject
syzkaller report this:
BUG: memory leak
unreferenced object 0xffff88837a71a500 (size 256):
comm "syz-executor.2", pid 9770, jiffies
4297825125 (age 17.843s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff ........ .......
backtrace:
[<
00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
[<
00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
[<
00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline]
[<
00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
[<
000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline]
[<
000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
[<
0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
[<
00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline]
[<
00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline]
[<
00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
[<
000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
[<
00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<
00000000115be9bb>] 0xffffffffffffffff
It should call kset_unregister to free 'dev->queues_kset'
in error path of register_queue_kobjects, otherwise will cause a mem leak.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sat, 2 Mar 2019 00:37:25 +0000 (16:37 -0800)]
fsl/fman: Use vsprintf extension %pM
Make logging of an ethernet address more consistent with
the rest of the kernel.
Miscellanea:
The %02hx use also did not quite match the u8 definition
of addr though that did not actually matter given normal
integer promotion rules.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Ruggeri [Fri, 1 Mar 2019 23:31:03 +0000 (15:31 -0800)]
net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE
By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
receive all IPv6 RA packets from all namespaces.
IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
the socket to be only from the socket's namespace.
Signed-off-by: Maxim Martynov <maxim@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 1 Mar 2019 18:53:57 +0000 (19:53 +0100)]
net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex
When testing another issue I faced the problem that
mv88e6xxx_port_setup_mac() failed due to DUPLEX_UNKNOWN being passed
as argument to mv88e6xxx_port_set_duplex(). We should handle this case
gracefully and return -EOPNOTSUPP, like e.g. mv88e6xxx_port_set_speed()
is doing it.
Fixes: 7f1ae07b51e8 ("net: dsa: mv88e6xxx: add port duplex setter")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Fri, 1 Mar 2019 18:39:44 +0000 (18:39 +0000)]
net: fixup address-space warnings in compat_mc_{get,set}sockopt()
Add __user attributes in some of the casts in this function to avoid
the following sparse warnings:
net/compat.c:592:57: warning: cast removes address space of expression
net/compat.c:592:57: warning: incorrect type in initializer (different address spaces)
net/compat.c:592:57: expected struct compat_group_req [noderef] <asn:1>*gr32
net/compat.c:592:57: got void *<noident>
net/compat.c:613:65: warning: cast removes address space of expression
net/compat.c:613:65: warning: incorrect type in initializer (different address spaces)
net/compat.c:613:65: expected struct compat_group_source_req [noderef] <asn:1>*gsr32
net/compat.c:613:65: got void *<noident>
net/compat.c:634:60: warning: cast removes address space of expression
net/compat.c:634:60: warning: incorrect type in initializer (different address spaces)
net/compat.c:634:60: expected struct compat_group_filter [noderef] <asn:1>*gf32
net/compat.c:634:60: got void *<noident>
net/compat.c:672:52: warning: cast removes address space of expression
net/compat.c:672:52: warning: incorrect type in initializer (different address spaces)
net/compat.c:672:52: expected struct compat_group_filter [noderef] <asn:1>*gf32
net/compat.c:672:52: got void *<noident>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 1 Mar 2019 18:37:25 +0000 (10:37 -0800)]
net: dsa: Use prepare/commit phase in dsa_slave_vlan_rx_add_vid()
We were skipping the prepare phase which causes some problems with at
least a couple of drivers:
- mv88e6xxx chooses to skip programming VID = 0 with -EOPNOTSUPP in
the prepare phase, but we would still try to force this VID since we
would only call the commit phase and so we would get the driver to
return -EINVAL instead
- qca8k does not currently have a port_vlan_add() callback implemented,
yet we would try to call that unconditionally leading to a NPD
Fix both issues by conforming to the current model doing a
prepare/commit phase, this makes us consistent throughout the code and
assumptions.
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Michal Vokáč <michal.vokac@ysoft.com>
Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 04:41:18 +0000 (20:41 -0800)]
Merge branch 'dpaa2-eth-add-XDP_REDIRECT-support'
Ioana Ciornei says:
====================
dpaa2-eth: add XDP_REDIRECT support
The first patch adds different software annotation types for Tx frames
depending on frame type while the second one actually adds support for basic
XDP_REDIRECT.
Changes in v2:
- add missing xdp_do_flush_map() call
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Fri, 1 Mar 2019 17:47:24 +0000 (17:47 +0000)]
dpaa2-eth: add XDP_REDIRECT support
Implement support for the XDP_REDIRECT action.
The redirected frame is transmitted and confirmed on the regular Tx/Tx
conf queues. Frame is marked with the "XDP" type in the software
annotation, since it requires special treatment.
We don't have good hardware support for TX batching, so the
XDP_XMIT_FLUSH flag doesn't make a difference for now; ndo_xdp_xmit
performs the actual Tx operation on the spot.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Fri, 1 Mar 2019 17:47:23 +0000 (17:47 +0000)]
dpaa2-eth: Add software annotation types
We write different metadata information in the software annotation
area of Tx frames, depending on frame type. Make this more explicit
by introducing a type field and separate structures for single buffer
and scatter-gather frames.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Mar 2019 04:14:28 +0000 (20:14 -0800)]
Merge branch 'sched-Patches-from-out-of-tree-version-of-sch_cake'
Toke Høiland-Jørgensen says:
====================
sched: Patches from out-of-tree version of sch_cake
This series includes a couple of patches with updates from the out-of-tree
version of sch_cake. The first one is a fix to the fairness scheduling when
dual-mode fairness is enabled. The second patch is an additional feature flag
that allows using fwmark as a tin selector, as a convenience for people who want
to customise tin selection. The third patch is just a cleanup to the tin
selection logic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Fri, 1 Mar 2019 15:04:05 +0000 (16:04 +0100)]
sch_cake: Simplify logic in cake_select_tin()
With more modes added the logic in cake_select_tin() was getting a bit
hairy, and it turns out we can actually simplify it quite a bit. This also
allows us to get rid of one of the two diffserv parsing functions, which
has the added benefit that already-zeroed DSCP fields won't get re-written.
Suggested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Darbyshire-Bryant [Fri, 1 Mar 2019 15:04:05 +0000 (16:04 +0100)]
sch_cake: Permit use of connmarks as tin classifiers
Add flag 'FWMARK' to enable use of firewall connmarks as tin selector.
The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie.
for diffserv3 the mark needs to be 1->3.
Background
Typically CAKE uses DSCP as the basis for tin selection. DSCP values
are relatively easily changed as part of the egress path, usually with
iptables & the mangle table, ingress is more challenging. CAKE is often
used on the WAN interface of a residential gateway where passthrough of
DSCP from the ISP is either missing or set to unhelpful values thus use
of ingress DSCP values for tin selection isn't helpful in that
environment.
An approach to solving the ingress tin selection problem is to use
CAKE's understanding of tc filters. Naive tc filters could match on
source/destination port numbers and force tin selection that way, but
multiple filters don't scale particularly well as each filter must be
traversed whether it matches or not. e.g. a simple example to map 3
firewall marks to tins:
MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1
tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2
tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3
Another option is to use eBPF cls_act with tc filters e.g.
MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' )
tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o
This has the disadvantages of a) needing someone to write & maintain
the bpf program, b) a bpf toolchain to compile it and c) needing to
hardcode the major number in the bpf program so it matches the cake
instance (or forcing the cake instance to a particular major number)
since the major number cannot be passed to the bpf program via tc
command line.
As already hinted at by the previous examples, it would be helpful
to associate tins with something that survives the Internet path and
ideally allows tin selection on both egress and ingress. Netfilter's
conntrack permits setting an identifying mark on a connection which
can also be restored to an ingress packet with tc action connmark e.g.
tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \
match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1
Since tc's connmark action has restored any connmark into skb->mark,
any of the previous solutions are based upon it and in one form or
another copy that mark to the skb->priority field where again CAKE
picks this up.
This change cuts out at least one of the (less intuitive &
non-scalable) middlemen and permit direct access to skb->mark.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>