git.openwrt.org Git - openwrt/staging/blogic.git/log

netfilter: nf_conncount: remove wrong condition check routine

All lists that reach the tree_nodes_free() function have both zero
counter and true dead flag. The reason for this is that lists to be
release are selected by nf_conncount_gc_list() which already decrements
the list counter and sets on the dead flag. Therefore, this if statement
in tree_nodes_free() is unnecessary and wrong.

Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nat: fix double register in masquerade modules

There is a reference counter to ensure that masquerade modules register
notifiers only once. However, the existing reference counter approach is
not safe, test commands are:

   while :
   do
       modprobe ip6t_MASQUERADE &
   modprobe nft_masq_ipv6 &
   modprobe -rv ip6t_MASQUERADE &
   modprobe -rv nft_masq_ipv6 &
   done

numbers below represent the reference counter.
--------------------------------------------------------
CPU0        CPU1        CPU2        CPU3        CPU4
[insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
--------------------------------------------------------
0->1
register    1->2
            returns     2->1
returns     1->0
                                                0->1
                                                register <--
                                    unregister
--------------------------------------------------------

The unregistation of CPU3 should be processed before the
registration of CPU4.

In order to fix this, use a mutex instead of reference counter.

splat looks like:
[  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
[  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
[  323.869574] irq event stamp: 194074
[  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
[  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
[  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
[  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
[  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
[  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
[  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
[  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
[  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
[  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
[  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
[  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
[  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
[  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
[  323.898930] Call Trace:
[  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
[  323.898930]  ? down_read+0x150/0x150
[  323.898930]  ? sched_clock_cpu+0x126/0x170
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  register_netdevice_notifier+0xbb/0x790
[  323.898930]  ? __dev_close_many+0x2d0/0x2d0
[  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
[  323.898930]  ? wait_for_completion+0x710/0x710
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  323.898930]  ? up_write+0x6c/0x210
[  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
[  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
[  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
[ ... ]

Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: add missing error handling code for register functions

register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipv6: Preserve link scope traffic original oif

When ip6_route_me_harder is invoked, it resets outgoing interface of:
  - link-local scoped packets sent by neighbor discovery
  - multicast packets sent by MLD host
  - multicast packets send by MLD proxy daemon that sets outgoing
    interface through IPV6_PKTINFO ipi6_ifindex

Link-local and multicast packets must keep their original oif after
ip6_route_me_harder is called.

Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too

syzbot was able to trigger the WARN in cttimeout_default_get() by
passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use
same timeout values.

Furthermore, also fetch GRE timeouts. GRE is a bit more complicated,
as it still can be a module and its netns_proto_gre struct layout isn't
visible outside of the gre module. Can't move timeouts around, it
appears conntrack sysctl unregister assumes net_generic() returns
nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.

A followup nf-next patch could make gre tracker be built-in as well
if needed, its not that large.

Last, make the WARN() mention the missing protocol value in case
anything else is missing.

Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf

ip_vs_dst_event is supposed to clean up all dst used in ipvs'
destinations when a net dev is going down. But it works only
when the dst's dev is the same as the dev from the event.

Now with the same priority but late registration,
ip_vs_dst_notifier is always called later than ipv6_dev_notf
where the dst's dev is set to lo for NETDEV_DOWN event.

As the dst's dev lo is not the same as the dev from the event
in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
Also as these dst have to wait for dest_trash_timer to clean
them up. It would cause some non-permanent kernel warnings:

unregister_netdevice: waiting for br0 to become free. Usage count = 3

To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.

Note that for ipv4 route fib_netdev_notifier doesn't set dst's
dev to lo in NETDEV_DOWN event, so this fix is only needed when
IP_VS_IPV6 is defined.

Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_hashlimit: fix a possible memory leak in htable_create()

In the htable_create(), hinfo is allocated by vmalloc()
So that if error occurred, hinfo should be freed.

Fixes: 11d5f15723c9 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix use-after-free when deleting compat expressions

nft_compat ops do not have static storage duration, unlike all other
expressions.

When nf_tables_expr_destroy() returns, expr->ops might have been
free'd already, so we need to store next address before calling
expression destructor.

For same reason, we can't deref match pointer after nft_xt_put().

This can be easily reproduced by adding msleep() before
nft_match_destroy() returns.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_RATEEST: remove netns exit routine

xt_rateest_net_exit() was added to check whether rules are flushed
successfully. but ->net_exit() callback is called earlier than
->destroy() callback.
So that ->net_exit() callback can't check that.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \
   --dport 1111 -j RATEEST --rateest-name ap \
   --rateest-interval 250ms --rateest-ewma 0.5s
   %ip netns del vm1

splat looks like:
[  668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables
[  668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21
[  668.813518] Workqueue: netns cleanup_net
[  668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST]
[  668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49
[  668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282
[  668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000
[  668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74
[  668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be
[  668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08
[  668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000
[  668.813518] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[  668.813518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0
[  668.813518] Call Trace:
[  668.813518]  ? unregister_netdevice_many+0xe0/0xe0
[  668.813518]  ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST]
[  668.813518]  ? default_device_exit+0x1ca/0x270
[  668.813518]  ? remove_proc_entry+0x1cd/0x390
[  668.813518]  ? dev_change_net_namespace+0xd00/0xd00
[  668.813518]  ? __init_waitqueue_head+0x130/0x130
[  668.813518]  ops_exit_list.isra.10+0x94/0x140
[  668.813518]  cleanup_net+0x45b/0x900
[  668.813518]  ? net_drop_ns+0x110/0x110
[  668.813518]  ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80
[  668.813518]  ? save_trace+0x300/0x300
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? lock_acquire+0x196/0x470
[  668.813518]  ? process_one_work+0xb60/0x1de0
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? _raw_spin_unlock_irq+0x29/0x40
[  668.813518]  ? __lock_acquire+0x4500/0x4500
[  668.813518]  ? __lock_is_held+0xb4/0x140
[  668.813518]  process_one_work+0xc13/0x1de0
[  668.813518]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[  668.813518]  ? set_load_weight+0x270/0x270
[ ... ]

Fixes: 3427b2ab63fa ("netfilter: make xt_rateest hash table per net")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: don't use position attribute on rule replacement

Its possible to set both HANDLE and POSITION when replacing a rule.
In this case, the rule at POSITION gets replaced using the
userspace-provided handle. Rule handles are supposed to be generated
by the kernel only.

Duplicate handles should be harmless, however better disable this "feature"
by only checking for the POSITION attribute on insert operations.

Fixes: 5e94846686d0 ("netfilter: nf_tables: add insert operation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: add script to stress-test nft packet path vs. control plane

Start flood ping for each cpu while loading/flushing rulesets to make
sure we do not access already-free'd rules from nf_tables evaluation loop.

Also add this to TARGETS so 'make run_tests' in selftest dir runs it
automatically.

This would have caught the bug fixed in previous change
("netfilter: nf_tables: do not skip inactive chains during generation update")
sooner.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: don't skip inactive chains during update

There is no synchronization between packet path and the configuration plane.

The packet path uses two arrays with rules, one contains the current (active)
generation.  The other either contains the last (obsolete) generation or
the future one.

Consider:
cpu1               cpu2
                   nft_do_chain(c);
delete c
net->gen++;
                   genbit = !!net->gen;
                   rules = c->rg[genbit];

cpu1 ignores c when updating if c is not active anymore in the new
generation.

On cpu2, we now use rules from wrong generation, as c->rg[old]
contains the rules matching 'c' whereas c->rg[new] was not updated and
can even point to rules that have been free'd already, causing a crash.

To fix this, make sure that 'current' to the 'next' generation are
identical for chains that are going away so that c->rg[new] will just
use the matching rules even if genbit was incremented already.

Fixes: 0cbc06b3faba7 ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: fix unexpected permanent node of list.

When list->count is 0, the list is deleted by GC. But list->count is
never reached 0 because initial count value is 1 and it is increased
when node is inserted. So that initial value of list->count should be 0.

Originally GC always finds zero count list through deleting node and
decreasing count. However, list may be left empty since node insertion
may fail eg. allocaton problem. In order to solve this problem, GC
routine also finds zero count list without deleting node.

Fixes: cb2b36f5a97d ("netfilter: nf_conncount: Switch to plain list")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: fix list_del corruption in conn_free

nf_conncount_tuple is an element of nft_connlimit and that is deleted by
conn_free(). Elements can be deleted by both GC routine and data path
functions (nf_conncount_lookup, nf_conncount_add) and they call
conn_free() to free elements. But conn_free() only protects lists, not
each element. So that list_del corruption could occurred.

The conn_free() doesn't check whether element is already deleted. In
order to protect elements, dead flag is added. If an element is deleted,
dead flag is set. The only conn_free() can delete elements so that both
list lock and dead flag are enough to protect it.

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip id ct count over 2 } counter

splat looks like:
[ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200)
[ 1779.505453] ------------[ cut here ]------------
[ 1779.506260] kernel BUG at lib/list_debug.c:50!
[ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22
[ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set]
[ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150
[ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b
[ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286
[ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000
[ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a
[ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9
[ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008
[ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008
[ 1779.516772] FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[ 1779.516772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0
[ 1779.516772] Call Trace:
[ 1779.516772]  conn_free+0x9f/0x2b0 [nf_conncount]
[ 1779.516772]  ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack]
[ 1779.516772]  ? nf_conncount_add+0x520/0x520 [nf_conncount]
[ 1779.516772]  ? do_raw_spin_trylock+0x1a0/0x1a0
[ 1779.516772]  ? do_raw_spin_trylock+0x10/0x1a0
[ 1779.516772]  find_or_evict+0xe5/0x150 [nf_conncount]
[ 1779.516772]  nf_conncount_gc_list+0x162/0x360 [nf_conncount]
[ 1779.516772]  ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount]
[ 1779.516772]  ? _raw_spin_unlock_irqrestore+0x45/0x50
[ 1779.516772]  ? trace_hardirqs_off+0x6b/0x220
[ 1779.516772]  ? trace_hardirqs_on_caller+0x220/0x220
[ 1779.516772]  nft_rhash_gc+0x16b/0x540 [nf_tables_set]
[ ... ]

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: use spin_lock_bh instead of spin_lock

conn_free() holds lock with spin_lock() and it is called by both
nf_conncount_lookup() and nf_conncount_gc_list(). nf_conncount_lookup()
is called from bottom-half context and nf_conncount_gc_list() from
process context. So that spin_lock() call is not safe. Hence
conn_free() should use spin_lock_bh() instead of spin_lock().

test commands:
   %nft add table ip filter
   %nft add chain ip filter input { type filter hook input priority 0\; }
   %nft add rule filter input meter test { ip saddr ct count over 2 } \
   counter

splat looks like:
[  461.996507] ================================
[  461.998999] WARNING: inconsistent lock state
[  461.998999] 4.19.0-rc6+ #22 Not tainted
[  461.998999] --------------------------------
[  461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount]
[  461.998999] {IN-SOFTIRQ-W} state was registered at:
[  461.998999]   _raw_spin_lock+0x30/0x70
[  461.998999]   nf_conncount_add+0x28a/0x520 [nf_conncount]
[  461.998999]   nft_connlimit_eval+0x401/0x580 [nft_connlimit]
[  461.998999]   nft_dynset_eval+0x32b/0x590 [nf_tables]
[  461.998999]   nft_do_chain+0x497/0x1430 [nf_tables]
[  461.998999]   nft_do_chain_ipv4+0x255/0x330 [nf_tables]
[  461.998999]   nf_hook_slow+0xb1/0x160
[ ... ]
[  461.998999] other info that might help us debug this:
[  461.998999]  Possible unsafe locking scenario:
[  461.998999]
[  461.998999]        CPU0
[  461.998999]        ----
[  461.998999]   lock(&(&list->list_lock)->rlock);
[  461.998999]   <Interrupt>
[  461.998999]     lock(&(&list->list_lock)->rlock);
[  461.998999]
[  461.998999]  *** DEADLOCK ***
[  461.998999]
[ ... ]

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Linux 4.20-rc2

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"One last pull request before heading to Vancouver for LPC, here we have:

   1) Don't forget to free VSI contexts during ice driver unload, from
      Victor Raj.

   2) Don't forget napi delete calls during device remove in ice driver,
      from Dave Ertman.

   3) Don't request VLAN tag insertion of ibmvnic device when SKB
      doesn't have VLAN tags at all.

   4) IPV4 frag handling code has to accomodate the situation where two
      threads try to insert the same fragment into the hash table at the
      same time. From Eric Dumazet.

   5) Relatedly, don't flow separate on protocol ports for fragmented
      frames, also from Eric Dumazet.

   6) Memory leaks in qed driver, from Denis Bolotin.

   7) Correct valid MTU range in smsc95xx driver, from Stefan Wahren.

   8) Validate cls_flower nested policies properly, from Jakub Kicinski.

   9) Clearing of stats counters in mc88e6xxx driver doesn't retain
      important bits in the G1_STATS_OP register causing the chip to
      hang. Fix from Andrew Lunn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  act_mirred: clear skb->tstamp on redirect
  net: dsa: mv88e6xxx: Fix clearing of stats counters
  tipc: fix link re-establish failure
  net: sched: cls_flower: validate nested enc_opts_policy to avoid warning
  net: mvneta: correct typo
  flow_dissector: do not dissect l4 ports for fragments
  net: qualcomm: rmnet: Fix incorrect assignment of real_dev
  net: aquantia: allow rx checksum offload configuration
  net: aquantia: invalid checksumm offload implementation
  net: aquantia: fixed enable unicast on 32 macvlan
  net: aquantia: fix potential IOMMU fault after driver unbind
  net: aquantia: synchronized flow control between mac/phy
  net: smsc95xx: Fix MTU range
  net: stmmac: Fix RX packet size > 8191
  qed: Fix potential memory corruption
  qed: Fix SPQ entries not returned to pool in error flows
  qed: Fix blocking/unlimited SPQ entries leak
  qed: Fix memory/entry leak in qed_init_sp_request()
  inet: frags: better deal with smp races
  net: hns3: bugfix for not checking return value
  ...

Merge tag 'kbuild-fixes-v4.20' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- fix build errors in binrpm-pkg and bindeb-pkg targets

- fix false positive matches in merge_config.sh

- fix build version mismatch in deb-pkg target

- fix dtbs_install handling in (bin)deb-pkg target

- revert a commit that allows setlocalversion to write to source tree

* tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  builddeb: Fix inclusion of dtbs in debian package
  Revert "scripts/setlocalversion: git: Make -dirty check more robust"
  kbuild: deb-pkg: fix too low build version number
  kconfig: merge_config: avoid false positive matches from comment lines
  kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
  kbuild: rpm-pkg: fix binrpm-pkg breakage when O= is used

Merge tag 'for-4.20-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Several fixes to recent release (4.19, fixes tagged for stable) and
  other fixes"

* tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix missing delayed iputs on unmount
  Btrfs: fix data corruption due to cloning of eof block
  Btrfs: fix infinite loop on inode eviction after deduplication of eof block
  Btrfs: fix deadlock on tree root leaf when finding free extent
  btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
  btrfs: tree-checker: Fix misleading group system information
  Btrfs: fix missing data checksums after a ranged fsync (msync)
  btrfs: fix pinned underflow after transaction aborted
  Btrfs: fix cur_offset in the error case for nocow

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A large number of ext4 bug fixes, mostly buffer and memory leaks on
  error return cleanup paths"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: missing !bh check in ext4_xattr_inode_write()
  ext4: fix buffer leak in __ext4_read_dirblock() on error path
  ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path
  ext4: fix buffer leak in ext4_xattr_move_to_block() on error path
  ext4: release bs.bh before re-using in ext4_xattr_block_find()
  ext4: fix buffer leak in ext4_xattr_get_block() on error path
  ext4: fix possible leak of s_journal_flag_rwsem in error path
  ext4: fix possible leak of sbi->s_group_desc_leak in error path
  ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref()
  ext4: avoid possible double brelse() in add_new_gdb() on error path
  ext4: avoid buffer leak in ext4_orphan_add() after prior errors
  ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()
  ext4: fix possible inode leak in the retry loop of ext4_resize_fs()
  ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing
  ext4: add missing brelse() update_backups()'s error path
  ext4: add missing brelse() add_new_gdb_meta_bg()'s error path
  ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path
  ext4: avoid potential extra brelse in setup_new_flex_group_blocks()

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:

   - Cure the LDT remapping to user space on 5 level paging which ended
     up in the KASLR space

   - Remove LDT mapping before freeing the LDT pages

   - Make NFIT MCE handling more robust

   - Unbreak the VSMP build by removing the dependency on paravirt ops

   - Support broken PIT emulation on Microsoft hyperV

   - Don't trace vmware_sched_clock() to avoid tracer recursion

   - Remove -pipe from KBUILD CFLAGS which breaks clang and is also
     slower on GCC

   - Trivial coding style and typo fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/vmware: Do not trace vmware_sched_clock()
  x86/vsmp: Remove dependency on pv_irq_ops
  x86/ldt: Remove unused variable in map_ldt_struct()
  x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  x86/mm: Move LDT remap out of KASLR region on 5-level paging
  acpi/nfit, x86/mce: Validate a MCE's address before using it
  acpi/nfit, x86/mce: Handle only uncorrectable machine checks
  x86/build: Remove -pipe from KBUILD_CFLAGS
  x86/hyper-v: Fix indentation in hv_do_fast_hypercall16()
  Documentation/x86: Fix typo in zero-page.txt
  x86/hyper-v: Enable PIT shutdown quirk
  clockevents/drivers/i8253: Add support for PIT shutdown quirk

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
"A bunch of perf tooling fixes:

   - Make the Intel PT SQL viewer more robust

   - Make the Intel PT debug log more useful

   - Support weak groups in perf record so it's behaving the same way as
     perf stat

   - Display the LBR stats in callchain entries properly in perf top

   - Handle different PMu names with common prefix properlin in pert
     stat

   - Start syscall augmenting in perf trace. Preparation for
     architecture independent eBPF instrumentation of syscalls.

   - Fix build breakage in JVMTI perf lib

   - Fix arm64 tools build failure wrt smp_load_{acquire,release}"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Do not zero sample_id_all for group members
  perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so
  perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers
  perf intel-pt: Add MTC and CYC timestamps to debug log
  perf intel-pt: Add more event information to debug log
  perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered
  perf scripts python: exported-sql-viewer.py: Add help window
  perf scripts python: exported-sql-viewer.py: Add Selected branches report
  perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so
  perf top: Display the LBR stats in callchain entry
  perf stat: Handle different PMU names with common prefix
  perf record: Support weak groups
  perf evlist: Move perf_evsel__reset_weak_group into evlist
  perf augmented_syscalls: Start collecting pathnames in the BPF program
  perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls
  perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too
  perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit}
  tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release}

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"Just the removal of a redundant call into the sched deadline overrun
check"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Remove useless call to check_dl_overrun()

Merge branch 'sched/urgent' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
"Two small scheduler fixes:

   - Take hotplug lock in sched_init_smp(). Technically not really
     required, but lockdep will complain other.

   - Trivial comment fix in sched/fair"

* 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix a comment in task_numa_fault()
  sched/core: Take the hotplug lock in sched_init_smp()

Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking build fix from Thomas Gleixner:
"A single fix for a build fail with CONFIG_PROFILE_ALL_BRANCHES=y in
the qspinlock code"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/qspinlock: Fix compile error

Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull core fixes from Thomas Gleixner:
"A couple of fixlets for the core:

   - Kernel doc function documentation fixes

   - Missing prototypes for weak watchdog functions"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  resource/docs: Complete kernel-doc style function documentation
  watchdog/core: Add missing prototypes for weak functions
  resource/docs: Fix new kernel-doc warnings

act_mirred: clear skb->tstamp on redirect

If sch_fq is used at ingress, skbs that might have been
timestamped by net_timestamp_set() if a packet capture
is requesting timestamps could be delayed by arbitrary
amount of time, since sch_fq time base is MONOTONIC.

Fix this problem by moving code from sch_netem.c to act_mirred.c.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix clearing of stats counters

The mv88e6161 would sometime fail to probe with a timeout waiting for
the switch to complete an operation. This operation is supposed to
clear the statistics counters. However, due to a read/modify/write,
without the needed mask, the operation actually carried out was more
random, with invalid parameters, resulting in the switch not
responding. We need to preserve the histogram mode bits, so apply a
mask to keep them.

Reported-by: Chris Healy <Chris.Healy@zii.aero>
Fixes: 40cff8fca9e3 ("net: dsa: mv88e6xxx: Fix stats histogram mode")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix link re-establish failure

When a link failure is detected locally, the link is reset, the flag
link->in_session is set to false, and a RESET_MSG with the 'stopping'
bit set is sent to the peer.

The purpose of this bit is to inform the peer that this endpoint just
is going down, and that the peer should handle the reception of this
particular RESET message as a local failure. This forces the peer to
accept another RESET or ACTIVATE message from this endpoint before it
can re-establish the link. This again is necessary to ensure that
link session numbers are properly exchanged before the link comes up
again.

If a failure is detected locally at the same time at the peer endpoint
this will do the same, which is also a correct behavior.

However, when receiving such messages, the endpoints will not
distinguish between 'stopping' RESETs and ordinary ones when it comes
to updating session numbers. Both endpoints will copy the received
session number and set their 'in_session' flags to true at the
reception, while they are still expecting another RESET from the
peer before they can go ahead and re-establish. This is contradictory,
since, after applying the validation check referred to below, the
'in_session' flag will cause rejection of all such messages, and the
link will never come up again.

We now fix this by not only handling received RESET/STOPPING messages
as a local failure, but also by omitting to set a new session number
and the 'in_session' flag in such cases.

Fixes: 7ea817f4e832 ("tipc: check session number before accepting link protocol messages")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

builddeb: Fix inclusion of dtbs in debian package

Commit 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules")
moved the location of 'dtbs_install' target which caused dtbs to not be
installed when building debian package with 'bindeb-pkg' target. Update
the builddeb script to use the same logic that determines if there's a
'dtbs_install' target which is presence of the arch dts directory. Also,
use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better
indication of whether we are building dtbs.

This commit will also have the side effect of installing dtbs on any
arch that has dts files. Previously, it was dependent on whether the
arch defined 'dtbs_install'.

Fixes: 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules")
Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

Revert "scripts/setlocalversion: git: Make -dirty check more robust"

This reverts commit 6147b1cf19651c7de297e69108b141fb30aa2349.

The reverted patch results in attempted write access to the source
repository, even if that repository is mounted read-only.

Output from "strace git status -uno --porcelain":

getcwd("/tmp/linux-test", 129) = 16
open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) =
-1 EROFS (Read-only file system)

While git appears to be able to handle this situation, a monitored
build environment (such as the one used for Chrome OS kernel builds)
may detect it and bail out with an access violation error. On top of
that, the attempted write access suggests that git _will_ write to the
file even if a build output directory is specified. Users may have the
reasonable expectation that the source repository remains untouched in
that situation.

Fixes: 6147b1cf19651 ("scripts/setlocalversion: git: Make -dirty check more robust"
Cc: Genki Sky <sky@genki.is>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

kbuild: deb-pkg: fix too low build version number

Since commit b41d920acff8 ("kbuild: deb-pkg: split generating packaging
and build"), the build version of the kernel contained in a deb package
is too low by 1.

Prior to the bad commit, the kernel was built first, then the number
in .version file was read out, and written into the debian control file.

Now, the debian control file is created before the kernel is actually
compiled, which is causing the version number mismatch.

Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require
the build system to use the specified version number.

Fixes: b41d920acff8 ("kbuild: deb-pkg: split generating packaging and build")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Doug Smythies <dsmythies@telus.net>

kconfig: merge_config: avoid false positive matches from comment lines

The current SED_CONFIG_EXP could match to comment lines in config
fragment files, especially when CONFIG_PREFIX_ is empty. For example,
Buildroot uses empty prefixing; starting symbols with BR2_ is just
convention.

Make the sed expression more robust against false positives from
comment lines. The new sed expression matches to only valid patterns.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>

Merge tag 'tty-4.20-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are some small tty fixes for 4.20-rc2

  One of these missed the original 4.19-final release, I missed that I
  hadn't done a pull request for it as it was in linux-next and my
  branch for a long time, that's my fault.

  The others are small, fixing some reported issues and finally fixing
  the termios mess for alpha so that glibc has a chance to implement
  some missing functionality that has been pending for many years now.

  All of these have been in linux-next with no reported issues"

* tag 'tty-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: sh-sci: Fix could not remove dev_attr_rx_fifo_timeout
  arch/alpha, termios: implement BOTHER, IBSHIFT and termios2
  termios, tty/tty_baudrate.c: fix buffer overrun
  vt: fix broken display when running aptitude
  serial: sh-sci: Fix receive on SCIFA/SCIFB variants with DMA

Merge tag 'drm-fixes-2018-11-11' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"drm: i915, amdgpu, sun4i, exynos and etnaviv fixes:

   - amdgpu has some display fixes, KFD ioctl fixes and a Vega20 bios
     interaction fix.

   - sun4i has some NULL checks added

   - i915 has a 32-bit system fix, LPE audio oops, and HDMI2.0 clock
     fixes.

   - Exynos has a 3 regression fixes (one frame counter, fbdev missing,
     dsi->panel check)

   - Etnaviv has a single fencing fix for GPU recovery"

* tag 'drm-fixes-2018-11-11' of git://anongit.freedesktop.org/drm/drm: (39 commits)
  drm/amd/amdgpu/dm: Fix dm_dp_create_fake_mst_encoder()
  drm/amd/display: Drop reusing drm connector for MST
  drm/amd/display: Cleanup MST non-atomic code workaround
  drm/amd/powerplay: always use fast UCLK switching when UCLK DPM enabled
  drm/amd/powerplay: set a default fclk/gfxclk ratio
  drm/amdgpu/display/dce11: only enable FBC when selected
  drm/amdgpu/display/dm: handle FBC dc feature parameter
  drm/amdgpu/display/dc: add FBC to dc_config
  drm/amdgpu: add DC feature mask module parameter
  drm/amdgpu/display: check if fbc is available in set_static_screen_control (v2)
  drm/amdgpu/vega20: add CLK base offset
  drm/amd/display: Stop leaking planes
  drm/amd/display: Fix misleading buffer information
  Revert "drm/amd/display: set backlight level limit to 1"
  drm/amd: Update atom_smu_info_v3_3 structure
  drm/i915: Fix ilk+ watermarks when disabling pipes
  drm/sun4i: tcon: prevent tcon->panel dereference if NULL
  drm/sun4i: tcon: fix check of tcon->panel null pointer
  drm/i915: Don't oops during modeset shutdown after lpe audio deinit
  drm/i915: Mark pin flags as u64
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull namespace fixes from Eric Biederman:
"I believe all of these are simple obviously correct bug fixes. These
  fall into two groups:

   - Fixing the implementation of MNT_LOCKED which prevents lesser
     privileged users from seeing unders mounts created by more
     privileged users.

   - Fixing the extended uid and group mapping in user namespaces.

  As well as ensuring the code looks correct I have spot tested these
  changes as well and in my testing the fixes are working"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mount: Prevent MNT_DETACH from disconnecting locked mounts
  mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts
  mount: Retest MNT_LOCKED in do_umount
  userns: also map extents in the reverse map to kernel IDs

Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A small set of fixes for clk drivers.

  One to fix a DT refcount imbalance, two to mark some Amlogic clks as
  critical, and one final one that fixes a clk name for the Qualcomm
  driver merged this cycle"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: gcc: Fix board clock node name
  clk: meson: axg: mark fdiv2 and fdiv3 as critical
  clk: meson-gxbb: set fclk_div3 as CLK_IS_CRITICAL
  clk: fixed-factor: fix of_node_get-put imbalance

Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Fixes for 4.20:
- DC MST fixes
- DC FBC fix
- Vega20 updates to support the latest vbios
- KFD type fixes for ioctl headers

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181108035551.2904-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2018-11-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

- sun4i: tcon->panel NULL deref protections (Giulio)

Cc: Giulio Benetti <giulio.benetti@micronovasrl.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20181107205051.GA27823@art_vandelay

Merge tag 'drm-intel-fixes-2018-11-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Bugzilla #108282 fixed: Avoid graphics corruption on 32-bit systems for Mesa 18.2.x
Avoid OOPS on LPE audio deinit. Remove two unused W/As.
Fix to correct HDMI 2.0 audio clock modes to spec.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181108134508.GA28466@jlahtine-desk.ger.corp.intel.com

net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear. Validation will also set extact to point to bad
attribute on error.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'for-linus-4.20a-rc2-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Several fixes, mostly for rather recent regressions when running under
  Xen"

* tag 'for-linus-4.20a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: remove size limit of privcmd-buf mapping interface
  xen: fix xen_qlock_wait()
  x86/xen: fix pv boot
  xen-blkfront: fix kernel panic with negotiate_mq error path
  xen/grant-table: Fix incorrect gnttab_dma_free_pages() pr_debug message
  CONFIG_XEN_PV breaks xen_create_contiguous_region on ARM

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Fix occasional page fault during boot due to memblock resizing before
   the linear map is up.

- Define NET_IP_ALIGN to 0 to improve the DMA performance on some
   platforms.

- lib/raid6 test build fix.

- .mailmap update for Punit Agrawal

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: memblock: don't permit memblock resizing until linear mapping is up
  arm64: mm: define NET_IP_ALIGN to 0
  lib/raid6: Fix arm64 test build
  mailmap: Update email for Punit Agrawal

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
"I2C has one bugfix (qcom-geni driver), one arch enablement (i2c-omap
  driver, no code change), and a new driver (nvidia-gpu) this time"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  usb: typec: ucsi: add support for Cypress CCGx
  i2c: nvidia-gpu: make pm_ops static
  i2c: add i2c bus driver for NVIDIA GPU
  i2c: qcom-geni: Fix runtime PM mismatch with child devices
  MAINTAINERS: Add entry for i2c-omap driver
  i2c: omap: Enable for ARCH_K3
  dt-bindings: i2c: omap: Add new compatible for AM654 SoCs

net: mvneta: correct typo

The reserved variable should be named reserved1.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: do not dissect l4 ports for fragments

Only first fragment has the sport/dport information,
not the following ones.

If we want consistent hash for all fragments, we need to
ignore ports even for first fragment.

This bug is visible for IPv6 traffic, if incoming fragments
do not have a flow label, since skb_get_hash() will give
different results for first fragment and following ones.

It is also visible if any routing rule wants dissection
and sport or dport.

See commit 5e5d6fed3741 ("ipv6: route: dissect flow
in input path if fib rules need it") for details.

[edumazet] rewrote the changelog completely.

Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qualcomm: rmnet: Fix incorrect assignment of real_dev

A null dereference was observed when a sysctl was being set
from userspace and rmnet was stuck trying to complete some actions
in the NETDEV_REGISTER callback. This is because the real_dev is set
only after the device registration handler completes.

sysctl call stack -

<6> Unable to handle kernel NULL pointer dereference at
    virtual address 00000108
<2> pc : rmnet_vnd_get_iflink+0x1c/0x28
<2> lr : dev_get_iflink+0x2c/0x40
<2>  rmnet_vnd_get_iflink+0x1c/0x28
<2>  inet6_fill_ifinfo+0x15c/0x234
<2>  inet6_ifinfo_notify+0x68/0xd4
<2>  ndisc_ifinfo_sysctl_change+0x1b8/0x234
<2>  proc_sys_call_handler+0xac/0x100
<2>  proc_sys_write+0x3c/0x4c
<2>  __vfs_write+0x54/0x14c
<2>  vfs_write+0xcc/0x188
<2>  SyS_write+0x60/0xc0
<2>  el0_svc_naked+0x34/0x38

device register call stack -

<2>  notifier_call_chain+0x84/0xbc
<2>  raw_notifier_call_chain+0x38/0x48
<2>  call_netdevice_notifiers_info+0x40/0x70
<2>  call_netdevice_notifiers+0x38/0x60
<2>  register_netdevice+0x29c/0x3d8
<2>  rmnet_vnd_newlink+0x68/0xe8
<2>  rmnet_newlink+0xa0/0x160
<2>  rtnl_newlink+0x57c/0x6c8
<2>  rtnetlink_rcv_msg+0x1dc/0x328
<2>  netlink_rcv_skb+0xac/0x118
<2>  rtnetlink_rcv+0x24/0x30
<2>  netlink_unicast+0x158/0x1f0
<2>  netlink_sendmsg+0x32c/0x338
<2>  sock_sendmsg+0x44/0x60
<2>  SyS_sendto+0x150/0x1ac
<2>  el0_svc_naked+0x34/0x38

Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'aquantia-fixes'

Igor Russkikh says:

====================
net: aquantia: 2018-11 bugfixes

The patchset fixes a number of bugs found in various areas after
driver validation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: allow rx checksum offload configuration

RX Checksum offloads could not be configured and ignored netdev features
flag for checksumming.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: invalid checksumm offload implementation

Packets with marked invalid IP/UDP/TCP checksums were considered as good
by the driver. The error was in a logic, processing offload bits in
RX descriptor.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fixed enable unicast on 32 macvlan

Fixed a condition mistake due to which macvlans unicast
item number 32 was not added in the unicast filter.

The consequence is that when exactly 32 macvlans are created
on NIC, the last created macvlan receives no traffic because
its MAC was not registered in HW.

Fixes: 94b3b542303f ("net: aquantia: vlan unicast address list correct handling")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fix potential IOMMU fault after driver unbind

IOMMU fault may occurr on unbind/bind or if_down/if_up sequence.

Although driver disables the rings on down, this is not enough.
Due to internal HW design, during subsequent initialization
NIC sometimes may reuse RX descriptors cache and write to the
host memory from the descriptor cache.
That's get catched by IOMMU on host.

This patch invalidates the descriptor cache in NIC on interface down
to prevent writing to the cached descriptors and to the memory pointed
in those descriptors.

Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: synchronized flow control between mac/phy

Flow control statuses were not synchronized between blocks,
that caused packets/link drop on some corner cases, when
MAC sent PFC although Phy was not expecting these to come.

Driver should readout the negotiated FC from phy and
configure RX block accordigly.

This is done on each link change event with information from FW.

Fixes: 288551de45aa ("net: aquantia: Implement rx/tx flow control ethtools callback")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'devicetree-fixes-for-4.20-2' of git://git./linux/kernel/git/robh/linux

Pull Devicetree fixes from Rob Herring:

- Add validation of NUMA distance map to prevent crashes with bad map

- Fix setting of dma_mask

* tag 'devicetree-fixes-for-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of, numa: Validate some distance map rules
of/device: Really only set bus DMA mask when appropriate

Merge tag 'for-linus-20181109' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:

- Two fixes for an ubd regression, one for missing locking, and one for
   a missing initialization of a field. The latter was an old latent
   bug, but it's now visible and triggers (Me, Anton Ivanov)

- Set of NVMe fixes via Christoph, but applied manually due to a git
   tree mixup (Christoph, Sagi)

- Fix for a discard split regression, in three patches (Ming)

- Update libata git trees (Geert)

- SPDX identifier for sata_rcar (Kuninori Morimoto)

- Virtual boundary merge fix (Johannes)

- Preemptively clear memory we are going to pass to userspace, in case
   the driver does a short read (Keith)

* tag 'for-linus-20181109' of git://git.kernel.dk/linux-block:
  block: make sure writesame bio is aligned with logical block size
  block: cleanup __blkdev_issue_discard()
  block: make sure discard bio is aligned with logical block size
  Revert "nvmet-rdma: use a private workqueue for delete"
  nvme: make sure ns head inherits underlying device limits
  nvmet: don't try to add ns to p2p map unless it actually uses it
  sata_rcar: convert to SPDX identifiers
  ubd: fix missing initialization of io_req
  block: Clear kernel memory before copying to user
  MAINTAINERS: Fix remaining pointers to obsolete libata.git
  ubd: fix missing lock around request issue
  block: respect virtual boundary mask in bvecs

Merge tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client

Pull Ceph fixes from Ilya Dryomov:
"Two CephFS fixes (copy_file_range and quota) and a small feature bit
  cleanup"

* tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client:
  libceph: assume argonaut on the server side
  ceph: quota: fix null pointer dereference in quota check
  ceph: add destination file data sync before doing any remote copy

Merge tag 'mips_fixes_4.20_2' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
"A couple of small MIPS fixes for 4.20:

   - Extend an array to avoid overruns on some Octeon hardware, fixing a
     bug introduced in 4.3.

   - Fix a coherent DMA regression for systems without cache-coherent
     DMA introduced in the 4.20 merge window"

* tag 'mips_fixes_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix `dma_alloc_coherent' returning a non-coherent allocation
  MIPS: OCTEON: fix out of bounds array access on CN68XX

clk: qcom: gcc: Fix board clock node name

Device tree node name are not supposed to have "_" in them so fix the
node name use of xo_board to xo-board

Fixes: 652f1813c113 ("clk: qcom: gcc: Add global clock controller driver for QCS404")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>

x86/cpu/vmware: Do not trace vmware_sched_clock()

When running function tracing on a Linux guest running on VMware
Workstation, the guest would crash. This is due to tracing of the
sched_clock internal call of the VMware vmware_sched_clock(), which
causes an infinite recursion within the tracing code (clock calls must
not be traced).

Make vmware_sched_clock() not traced by ftrace.

Fixes: 80e9a4f21fd7c ("x86/vmware: Add paravirt sched clock")
Reported-by: GwanYeong Kim <gy741.kim@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Alok Kataria <akataria@vmware.com>
CC: GwanYeong Kim <gy741.kim@gmail.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: virtualization@lists.linux-foundation.org
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.home

usb: typec: ucsi: add support for Cypress CCGx

Latest NVIDIA GPU cards have a Cypress CCGx Type-C controller
over I2C interface.

This UCSI I2C driver uses I2C bus driver interface for communicating
with Type-C controller.

Signed-off-by: Ajay Gupta <ajayg@nvidia.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

serial: sh-sci: Fix could not remove dev_attr_rx_fifo_timeout

This patch fixes an issue that the sci_remove() could not remove
dev_attr_rx_fifo_timeout because uart_remove_one_port() set
the port->port.type to PORT_UNKNOWN.

Reported-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com>
Fixes: 5d23188a473d ("serial: sh-sci: make RX FIFO parameters tunable via sysfs")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: nvidia-gpu: make pm_ops static

sparse rightfully says:

warning: symbol 'gpu_i2c_driver_pm' was not declared. Should it be static?

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

i2c: add i2c bus driver for NVIDIA GPU

Latest NVIDIA GPU card has USB Type-C interface. There is a
Type-C controller which can be accessed over I2C.

This driver adds I2C bus driver to communicate with Type-C controller.
I2C client driver will be part of USB Type-C UCSI driver.

Signed-off-by: Ajay Gupta <ajayg@nvidia.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[wsa: kept Makefile sorting]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

ext4: missing !bh check in ext4_xattr_inode_write()

According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write()
should not return bh = NULL

The only time that bh could be NULL, then, would be in the case of
something really going wrong; a programming error elsewhere (perhaps a
wild pointer dereference) or I/O error causing on-disk file system
corruption (although that would be highly unlikely given that we had
*just* allocated the blocks and so the metadata blocks in question
probably would still be in the cache).

Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 4.13

i2c: qcom-geni: Fix runtime PM mismatch with child devices

We need to enable runtime PM on this i2c controller before populating
child devices with i2c_add_adapter(). Otherwise, if a child device uses
runtime PM and stays runtime PM enabled we'll get the following warning
at boot.

Enabling runtime PM for inactive device (a98000.i2c) with active children

[...]

Call trace:
  pm_runtime_enable+0xd8/0xf8
  geni_i2c_probe+0x440/0x460
  platform_drv_probe+0x74/0xc8
[...]

Let's move the runtime PM enabling and setup to before we add the
adapter, so that this device can respond to runtime PM requests from
children.

Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

MAINTAINERS: Add entry for i2c-omap driver

Add separate entry for i2c-omap and add my name as maintainer for this
driver.

Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

i2c: omap: Enable for ARCH_K3

Allow I2C_OMAP to be built for K3 platforms.

Signed-off-by: Vignesh R <vigneshr@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

dt-bindings: i2c: omap: Add new compatible for AM654 SoCs

AM654 SoCs have same I2C IP as OMAP SoCs. Add new compatible to
handle AM654 SoCs. While at that reformat the existing compatible list
for older SoCs to list one valid compatible per line.

Signed-off-by: Vignesh R <vigneshr@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

xen: remove size limit of privcmd-buf mapping interface

Currently the size of hypercall buffers allocated via
/dev/xen/hypercall is limited to a default of 64 memory pages. For live
migration of guests this might be too small as the page dirty bitmask
needs to be sized according to the size of the guest. This means
migrating a 8GB sized guest is already exhausting the default buffer
size for the dirty bitmap.

There is no sensible way to set a sane limit, so just remove it
completely. The device node's usage is limited to root anyway, so there
is no additional DOS scenario added by allowing unlimited buffers.

While at it make the error path for the -ENOMEM case a little bit
cleaner by setting n_pages to the number of successfully allocated
pages instead of the target size.

Fixes: c51b3c639e01f2 ("xen: add new hypercall buffer mapping device")
Cc: <stable@vger.kernel.org> #4.18
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>

xen: fix xen_qlock_wait()

Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable")
introduced a regression for Xen guests running fully virtualized
(HVM or PVH mode). The Xen hypervisor wouldn't return from the poll
hypercall with interrupts disabled in case of an interrupt (for PV
guests it does).

So instead of disabling interrupts in xen_qlock_wait() use a nesting
counter to avoid calling xen_clear_irq_pending() in case
xen_qlock_wait() is nested.

Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable")
Cc: stable@vger.kernel.org
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>

block: make sure writesame bio is aligned with logical block size

Obviously the created writesame bio has to be aligned with logical block
size, and use bio_allowed_max_sectors() to retrieve this number.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Fixes: b49a0871be31a745b2ef ("block: remove split code in blkdev_issue_{discard,write_same}")
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: cleanup __blkdev_issue_discard()

Cleanup __blkdev_issue_discard() a bit:

- remove local variable of 'end_sect'
- remove code block of 'fail'

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: make sure discard bio is aligned with logical block size

Obviously the created discard bio has to be aligned with logical block size.

This patch introduces the helper of bio_allowed_max_sectors() for
this purpose.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()")
Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks")
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Revert "nvmet-rdma: use a private workqueue for delete"

This reverts commit 2acf70ade79d26b97611a8df52eb22aa33814cd4.

The commit never really fixed the intended issue and caused all
kinds of other issues, including a use before initialization.

Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: make sure ns head inherits underlying device limits

Whenever we update ns_head info, we need to make sure it is still
compatible with all underlying backing devices because although nvme
multipath doesn't have any explicit use of these limits, other devices
can still be stacked on top of it which may rely on the underlying limits.
Start with unlimited stacking limits, and every info update iterate over
siblings and adjust queue limits.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvmet: don't try to add ns to p2p map unless it actually uses it

Even without CONFIG_P2PDMA this results in a error print:
nvmet: no peer-to-peer memory is available that's supported by rxe0 and /dev/nullb0

Fixes: c6925093d0b2 ("nvmet: Optionally use PCI P2P memory")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 's390-4.20-2' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- A fix for the pgtable_bytes misaccounting on s390. The patch changes
   common code part in regard to page table folding and adds extra
   checks to mm_[inc|dec]_nr_[pmds|puds].

- Add FORCE for all build targets using if_changed

- Use non-loadable phdr for the .vmlinux.info section to avoid a
   segment overlap that confuses kexec

- Cleanup the attribute definition for the diagnostic sampling

- Increase stack size for CONFIG_KASAN=y builds

- Export __node_distance to fix a build error

- Correct return code of a PMU event init function

- An update for the default configs

* tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/perf: Change CPUM_CF return code in event init function
  s390: update defconfigs
  s390/mm: Fix ERROR: "__node_distance" undefined!
  s390/kasan: increase instrumented stack size to 64k
  s390/cpum_sf: Rework attribute definition for diagnostic sampling
  s390/mm: fix mis-accounting of pgtable_bytes
  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
  s390: avoid vmlinux segments overlap
  s390/vdso: add missing FORCE to build targets
  s390/decompressor: add missing FORCE to build targets

x86/xen: fix pv boot

Commit 9da3f2b7405440 ("x86/fault: BUG() when uaccess helpers fault on
kernel addresses") introduced a regression for booting Xen PV guests.

Xen PV guests are using __put_user() and __get_user() for accessing the
p2m map (physical to machine frame number map) as accesses might fail
in case of not populated areas of the map.

With above commit using __put_user() and __get_user() for accessing
kernel pages is no longer valid. So replace the Xen hack by adding
appropriate p2m access functions using the default fixup handler.

Fixes: 9da3f2b7405440 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>

net: smsc95xx: Fix MTU range

The commit f77f0aee4da4 ("net: use core MTU range checking in USB NIC
drivers") introduce a common MTU handling for usbnet. But it's missing
the necessary changes for smsc95xx. So set the MTU range accordingly.

This patch has been tested on a Raspberry Pi 3.

Fixes: f77f0aee4da4 ("net: use core MTU range checking in USB NIC drivers")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Fix RX packet size > 8191

Ping problems with packets > 8191 as shown:

PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data.
8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms
wrong data byte 8144 should be 0xd0 but was 0x0
16    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
      20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
%< ---------------snip--------------------------------------
8112  b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
      c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
8144  0 0 0 0 d0 d1
      ^^^^^^^
Notice the 4 bytes of 0 before the expected byte of d0.

Databook notes that the RX buffer must be a multiple of 4/8/16
bytes [1].

Update the DMA Buffer size define to 8188 instead of 8192. Remove
the -1 from the RX buffer size allocations and use the new
DMA Buffer size directly.

[1] Synopsys DesignWare Cores Ethernet MAC Universal v3.70a
    [section 8.4.2 - Table 8-24]

Tested on SoCFPGA Stratix10 with ping sweep from 100 to 8300 byte packets.

Fixes: 286a83721720 ("stmmac: add CHAINED descriptor mode support (V4)")
Suggested-by: Jose Abreu <jose.abreu@synopsys.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-Slowpath-Queue-bug-fixes'

Denis Bolotin says:

====================
qed: Slowpath Queue bug fixes

This patch series fixes several bugs in the SPQ mechanism.
It deals with SPQ entries management, preventing resource leaks, memory
corruptions and handles error cases throughout the driver.
Please consider applying to net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix potential memory corruption

A stuck ramrod should be deleted from the completion_pending list,
otherwise it will be added again in the future and corrupt the list.

Return error value to inform that ramrod is stuck and should be deleted.

Signed-off-by: Sagiv Ozeri <sagiv.ozeri@cavium.com>
Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix SPQ entries not returned to pool in error flows

qed_sp_destroy_request() API was added for SPQ users that need to
free/return the entry they acquired in their error flows.

Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix blocking/unlimited SPQ entries leak

When there are no SPQ entries left in the free_pool, new entries are
allocated and are added to the unlimited list. When an entry in the pool
is available, the content is copied from the original entry, and the new
entry is sent to the device. qed_spq_post() is not aware of that, so the
additional entry is stored in the original entry as p_post_ent, which can
later be returned to the pool.

Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix memory/entry leak in qed_init_sp_request()

Free the allocated SPQ entry or return the acquired SPQ entry to the free
list in error flows.

Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: frags: better deal with smp races

Multiple cpus might attempt to insert a new fragment in rhashtable,
if for example RPS is buggy, as reported by 배석진 in
https://patchwork.ozlabs.org/patch/994601/

We use rhashtable_lookup_get_insert_key() instead of
rhashtable_insert_fast() to let cpus losing the race
free their own inet_frag_queue and use the one that
was inserted by another cpu.

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: bugfix for not checking return value

hns3_reset_notify_init_enet() only return error early if the return
value of hns3_restore_vlan() is not 0.

This patch adds checking for the return value of hns3_restore_vlan.

Fixes: 7fa6be4fd2f6 ("net: hns3: fix incorrect return value/type of some functions")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'xfs-4.20-fixes-1' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

- fix incorrect dropping of error code from bmap

- print buffer offsets instead of useless hashed pointers when dumping
   corrupt metadata

- fix integer overflow in attribute verifier

* tag 'xfs-4.20-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix overflow in xfs_attr3_leaf_verify
  xfs: print buffer offsets when dumping corrupt buffers
  xfs: Fix error code in 'xfs_ioc_getbmap()'

Merge tag 'led-fixes-for-4.20-rc2' of git://git./linux/kernel/git/j.anaszewski/linux-leds

Pull LED fixes from Jacek Anaszewski:
"All three fixes are related to the newly added pattern trigger:

   - remove mutex_lock() from timer callback, which would trigger
     problems related to sleeping in atomic context, the removal is
     harmless since mutex protection turned out to be redundant in this
     case

   - fix pattern parsing to properly handle intervals with brightness == 0

   - fix typos in the ABI documentation"

* tag 'led-fixes-for-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  Documentation: ABI: led-trigger-pattern: Fix typos
  leds: trigger: Fix sleeping function called from invalid context
  Fix pattern handling optimalization

Merge tag 'sound-4.20-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Two small regression fixes for HD-audio: one about vga_switcheroo and
  runtime PM, and another about Oops on some Thinkpads"

* tag 'sound-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix incorrect clearance of thinkpad_acpi hooks
  vga_switcheroo: Fix missing gpu_bound call at audio client registration

of, numa: Validate some distance map rules

Currently the NUMA distance map parsing does not validate the distance
table for the distance-matrix rules 1-2 in [1].

However the arch NUMA code may enforce some of these rules, but not all.
Such is the case for the arm64 port, which does not enforce the rule that
the distance between separates nodes cannot equal LOCAL_DISTANCE.

The patch adds the following rules validation:
- distance of node to self equals LOCAL_DISTANCE
- distance of separate nodes > LOCAL_DISTANCE

This change avoids a yet-unresolved crash reported in [2].

A note on dealing with symmetrical distances between nodes:

Validating symmetrical distances between nodes is difficult. If it were
mandated in the bindings that every distance must be recorded in the
table, then it would be easy. However, it isn't.

In addition to this, it is also possible to record [b, a] distance only
(and not [a, b]). So, when processing the table for [b, a], we cannot
assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
distance may not be present in the table and current distance would be
default at REMOTE_DISTANCE.

As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
for b > a. This policy is different to kernel ACPI SLIT validation, which
allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
the distance debug message is dropped as it may be misleading (for a distance
which is later overwritten).

Some final notes on semantics:

- It is implied that it is the responsibility of the arch NUMA code to
reset the NUMA distance map for an error in distance map parsing.

- It is the responsibility of the FW NUMA topology parsing (whether OF or
ACPI) to enforce NUMA distance rules, and not arch NUMA code.

[1] Documents/devicetree/bindings/numa.txt
[2] https://www.spinics.net/lists/arm-kernel/msg683304.html

Cc: stable@vger.kernel.org # 4.7
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>

of/device: Really only set bus DMA mask when appropriate

of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.

Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Robert Richter <robert.richter@cavium.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>

clk: meson: axg: mark fdiv2 and fdiv3 as critical

Similar to gxbb and gxl platforms, axg SCPI Cortex-M co-processor
uses the fdiv2 and fdiv3 to, among other things, provide the cpu
clock.

Until clock hand-off mechanism makes its way to CCF and the generic
SCPI claims platform specific clocks, these clocks must be marked as
critical to make sure they are never disabled when needed by the
co-processor.

Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>

clk: meson-gxbb: set fclk_div3 as CLK_IS_CRITICAL

On the Khadas VIM2 (GXM) and LePotato (GXL) board there are problems
with reboot; e.g. a ~60 second delay between issuing reboot and the
board power cycling (and in some OS configurations reboot will fail
and require manual power cycling).

Similar to 'commit c987ac6f1f088663b6dad39281071aeb31d450a8 ("clk:
meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL")' the SCPI Cortex-M4
Co-Processor seems to depend on FCLK_DIV3 being operational.

Until commit 05f814402d6174369b3b29832cbb5eb5ed287059 ("clk:
meson: add fdiv clock gates"), this clock was modeled and left on by
the bootloader.

We don't have precise documentation about the SCPI Co-Processor and
its clock requirement so we are learning things the hard way.

Marking this clock as critical solves the problem but it should not
be viewed as final solution. Ideally, the SCPI driver should claim
these clocks. We also depends on some clock hand-off mechanism
making its way to CCF, to make sure the clock stays on between its
registration and the SCPI driver probe.

Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates")
Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>

arm64: memblock: don't permit memblock resizing until linear mapping is up

Bhupesh reports that having numerous memblock reservations at early
boot may result in the following crash:

  Unable to handle kernel paging request at virtual address ffff80003ffe0000
  ...
  Call trace:
   __memcpy+0x110/0x180
   memblock_add_range+0x134/0x2e8
   memblock_reserve+0x70/0xb8
   memblock_alloc_base_nid+0x6c/0x88
   __memblock_alloc_base+0x3c/0x4c
   memblock_alloc_base+0x28/0x4c
   memblock_alloc+0x2c/0x38
   early_pgtable_alloc+0x20/0xb0
   paging_init+0x28/0x7f8

This is caused by the fact that we permit memblock resizing before the
linear mapping is up, and so the memblock_reserved() array is moved
into memory that is not mapped yet.

So let's ensure that this crash can no longer occur, by deferring to
call to memblock_allow_resize() to after the linear mapping has been
created.

Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: define NET_IP_ALIGN to 0

On arm64, there is no need to add 2 bytes of padding to the start of
each network buffer just to make the IP header appear 32-bit aligned.

Since this might actually adversely affect DMA performance some
platforms, let's override NET_IP_ALIGN to 0 to get rid of this
padding.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

libceph: assume argonaut on the server side

No one is running pre-argonaut. In addition one of the argonaut
features (NOSRCADDR) has been required since day one (and a half,
2.6.34 vs 2.6.35) of the kernel client.

Allow for the possibility of reusing these feature bits later.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>

ceph: quota: fix null pointer dereference in quota check

This patch fixes a possible null pointer dereference in
check_quota_exceeded, detected by the static checker smatch, with the
following warning:

fs/ceph/quota.c:240 check_quota_exceeded()
error: we previously assumed 'realm' could be null (see line 188)

Fixes: b7a2921765cf ("ceph: quota: support for ceph.quota.max_files")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: add destination file data sync before doing any remote copy

If we try to copy into a file that was just written, any data that is
remote copied will be overwritten by our buffered writes once they are
flushed. When this happens, the call to invalidate_inode_pages2_range
will also return a -EBUSY error.

This patch fixes this by also sync'ing the destination file before
starting any copy.

Fixes: 503f82a9932d ("ceph: support copy_file_range file operation")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

sata_rcar: convert to SPDX identifiers

This patch updates license to use SPDX-License-Identifier
instead of verbose license text.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>