Jeffrin Jose T [Wed, 15 May 2019 06:44:04 +0000 (12:14 +0530)]
selftests: netfilter: missing error check when setting up veth interface
A test for the basic NAT functionality uses ip command which needs veth
device. There is a condition where the kernel support for veth is not
compiled into the kernel and the test script breaks. This patch contains
code for reasonable error display and correct code exit.
Signed-off-by: Jeffrin Jose T <jeffrin@rajagiritech.edu.in>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
YueHaibing [Fri, 17 May 2019 14:31:49 +0000 (22:31 +0800)]
ipvs: Fix use-after-free in ip_vs_in
BUG: KASAN: use-after-free in ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
Read of size 4 at addr
ffff8881e9b26e2c by task sshd/5603
CPU: 0 PID: 5603 Comm: sshd Not tainted 4.19.39+ #30
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x6a/0x270
kasan_report+0x179/0x2c0
ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
ip_vs_in+0xd8/0x170 [ip_vs]
nf_hook_slow+0x5f/0xe0
__ip_local_out+0x1d5/0x250
ip_local_out+0x19/0x60
__tcp_transmit_skb+0xba1/0x14f0
tcp_write_xmit+0x41f/0x1ed0
? _copy_from_iter_full+0xca/0x340
__tcp_push_pending_frames+0x52/0x140
tcp_sendmsg_locked+0x787/0x1600
? tcp_sendpage+0x60/0x60
? inet_sk_set_state+0xb0/0xb0
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x6d/0x80
sock_write_iter+0x121/0x1c0
? sock_sendmsg+0x80/0x80
__vfs_write+0x23e/0x370
vfs_write+0xe7/0x230
ksys_write+0xa1/0x120
? __ia32_sys_read+0x50/0x50
? __audit_syscall_exit+0x3ce/0x450
do_syscall_64+0x73/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff6f6147c60
Code: 73 01 c3 48 8b 0d 28 12 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 5d 73 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83
RSP: 002b:
00007ffd772ead18 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000034 RCX:
00007ff6f6147c60
RDX:
0000000000000034 RSI:
000055df30a31270 RDI:
0000000000000003
RBP:
000055df30a31270 R08:
0000000000000000 R09:
0000000000000000
R10:
00007ffd772ead70 R11:
0000000000000246 R12:
00007ffd772ead74
R13:
00007ffd772eae20 R14:
00007ffd772eae24 R15:
000055df2f12ddc0
Allocated by task 6052:
kasan_kmalloc+0xa0/0xd0
__kmalloc+0x10a/0x220
ops_init+0x97/0x190
register_pernet_operations+0x1ac/0x360
register_pernet_subsys+0x24/0x40
0xffffffffc0ea016d
do_one_initcall+0x8b/0x253
do_init_module+0xe3/0x335
load_module+0x2fc0/0x3890
__do_sys_finit_module+0x192/0x1c0
do_syscall_64+0x73/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 6067:
__kasan_slab_free+0x130/0x180
kfree+0x90/0x1a0
ops_free_list.part.7+0xa6/0xc0
unregister_pernet_operations+0x18b/0x1f0
unregister_pernet_subsys+0x1d/0x30
ip_vs_cleanup+0x1d/0xd2f [ip_vs]
__x64_sys_delete_module+0x20c/0x300
do_syscall_64+0x73/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at
ffff8881e9b26600 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2092 bytes inside of 4096-byte region [
ffff8881e9b26600,
ffff8881e9b27600)
The buggy address belongs to the page:
page:
ffffea0007a6c800 count:1 mapcount:0 mapping:
ffff888107c0e600 index:0x0 compound_mapcount: 0
flags: 0x17ffffc0008100(slab|head)
raw:
0017ffffc0008100 dead000000000100 dead000000000200 ffff888107c0e600
raw:
0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
while unregistering ipvs module, ops_free_list calls
__ip_vs_cleanup, then nf_unregister_net_hooks be called to
do remove nf hook entries. It need a RCU period to finish,
however net->ipvs is set to NULL immediately, which will
trigger NULL pointer dereference when a packet is hooked
and handled by ip_vs_in where net->ipvs is dereferenced.
Another scene is ops_free_list call ops_free to free the
net_generic directly while __ip_vs_cleanup finished, then
calling ip_vs_in will triggers use-after-free.
This patch moves nf_unregister_net_hooks from __ip_vs_cleanup()
to __ip_vs_dev_cleanup(), where rcu_barrier() is called by
unregister_pernet_device -> unregister_pernet_operations,
that will do the needed grace period.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: efe41606184e ("ipvs: convert to use pernet nf_hook api")
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Wed, 15 May 2019 18:15:32 +0000 (20:15 +0200)]
netfilter: nft_fib: Fix existence check support
NFTA_FIB_F_PRESENT flag was not always honored since eval functions did
not call nft_fib_store_result in all cases.
Given that in all callsites there is a struct net_device pointer
available which holds the interface data to be stored in destination
register, simplify nft_fib_store_result() to just accept that pointer
instead of the nft_pktinfo pointer and interface index. This also
allows to drop the index to interface lookup previously needed to get
the name associated with given index.
Fixes: 055c4b34b94f6 ("netfilter: nft_fib: Support existence check")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jagdish Motwani [Mon, 13 May 2019 18:17:40 +0000 (23:47 +0530)]
netfilter: nf_queue: fix reinject verdict handling
This patch fixes netfilter hook traversal when there are more than 1 hooks
returning NF_QUEUE verdict. When the first queue reinjects the packet,
'nf_reinject' starts traversing hooks with a proper hook_index. However,
if it again receives a NF_QUEUE verdict (by some other netfilter hook), it
queues the packet with a wrong hook_index. So, when the second queue
reinjects the packet, it re-executes hooks in between.
Fixes: 960632ece694 ("netfilter: convert hook list to an array")
Signed-off-by: Jagdish Motwani <jagdish.motwani@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 30 Apr 2019 12:53:11 +0000 (14:53 +0200)]
netfilter: nf_tables: fix oops during rule dump
We can oops in nf_tables_fill_rule_info().
Its not possible to fetch previous element in rcu-protected lists
when deletions are not prevented somehow: list_del_rcu poisons
the ->prev pointer value.
Before rcu-conversion this was safe as dump operations did hold
nfnetlink mutex.
Pass previous rule as argument, obtained by keeping a pointer to
the previous rule during traversal.
Fixes: d9adf22a291883 ("netfilter: nf_tables: use call_rcu in netlink dumps")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Sat, 18 May 2019 20:13:40 +0000 (13:13 -0700)]
Merge branch 'mlxsw-Two-port-module-fixes'
Ido Schimmel says:
====================
mlxsw: Two port module fixes
Patch #1 fixes driver initialization failure on old ASICs due to
unsupported register access. This is fixed by first testing if the
register is supported.
Patch #2 fixes reading of certain modules' EEPROM. The problem and
solution are explained in detail in the commit message.
Please consider both patches for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sat, 18 May 2019 15:58:29 +0000 (18:58 +0300)]
mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM
Prevent reading unsupported slave address from SFP EEPROM by testing
Diagnostic Monitoring Type byte in EEPROM. Read only page zero of
EEPROM, in case this byte is zero.
If some SFP transceiver does not support Digital Optical Monitoring
(DOM), reading SFP EEPROM slave address 0x51 could return an error.
Availability of DOM support is verified by reading from zero page
Diagnostic Monitoring Type byte describing how diagnostic monitoring is
implemented by transceiver. If bit 6 of this byte is set, it indicates
that digital diagnostic monitoring has been implemented. Otherwise it is
not and transceiver could fail to reply to transaction for slave address
0x51 [1010001X (A2h)], which is used to access measurements page.
Such issue has been observed when reading cable MCP2M00-xxxx,
MCP7F00-xxxx, and few others.
Fixes: 2ea109039cd3 ("mlxsw: spectrum: Add support for access cable info via ethtool")
Fixes: 4400081b631a ("mlxsw: spectrum: Fix EEPROM access in case of SFP/SFP+")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Sat, 18 May 2019 15:58:28 +0000 (18:58 +0300)]
mlxsw: core: Prevent QSFP module initialization for old hardware
Old Mellanox silicons, like switchx-2, switch-ib do not support reading
QSFP modules temperature through MTMP register. Attempt to access this
register on systems equipped with the this kind of silicon will cause
initialization flow failure.
Test for hardware resource capability is added in order to distinct
between old and new silicon - old silicons do not have such capability.
Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Fixes: 5c42eaa07bd0 ("mlxsw: core: Extend hwmon interface with QSFP module temperature attributes")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jorge E. Moreira [Thu, 16 May 2019 20:51:07 +0000 (13:51 -0700)]
vsock/virtio: Initialize core virtio vsock before registering the driver
Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are
accessed (while handling interrupts) before they are initialized.
[ 4.201410] BUG: unable to handle kernel paging request at
ffffffffffffffe8
[ 4.207829] IP: vsock_addr_equals_addr+0x3/0x20
[ 4.211379] PGD
28210067 P4D
28210067 PUD
28212067 PMD 0
[ 4.211379] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4.211379] Modules linked in:
[ 4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted
4.14.106-419297-gd7e28cc1f241 #1
[ 4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 4.211379] Workqueue: virtio_vsock virtio_transport_rx_work
[ 4.211379] task:
ffffa3273d175280 task.stack:
ffffaea1800e8000
[ 4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20
[ 4.211379] RSP: 0000:
ffffaea1800ebd28 EFLAGS:
00010286
[ 4.211379] RAX:
0000000000000002 RBX:
0000000000000000 RCX:
ffffffffb94e42f0
[ 4.211379] RDX:
0000000000000400 RSI:
ffffffffffffffe0 RDI:
ffffaea1800ebdd0
[ 4.211379] RBP:
ffffaea1800ebd58 R08:
0000000000000001 R09:
0000000000000001
[ 4.211379] R10:
0000000000000000 R11:
ffffffffb89d5d60 R12:
ffffaea1800ebdd0
[ 4.211379] R13:
00000000828cbfbf R14:
0000000000000000 R15:
ffffaea1800ebdc0
[ 4.211379] FS:
0000000000000000(0000) GS:
ffffa3273fd00000(0000) knlGS:
0000000000000000
[ 4.211379] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 4.211379] CR2:
ffffffffffffffe8 CR3:
000000002820e001 CR4:
00000000001606e0
[ 4.211379] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 4.211379] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 4.211379] Call Trace:
[ 4.211379] ? vsock_find_connected_socket+0x6c/0xe0
[ 4.211379] virtio_transport_recv_pkt+0x15f/0x740
[ 4.211379] ? detach_buf+0x1b5/0x210
[ 4.211379] virtio_transport_rx_work+0xb7/0x140
[ 4.211379] process_one_work+0x1ef/0x480
[ 4.211379] worker_thread+0x312/0x460
[ 4.211379] kthread+0x132/0x140
[ 4.211379] ? process_one_work+0x480/0x480
[ 4.211379] ? kthread_destroy_worker+0xd0/0xd0
[ 4.211379] ret_from_fork+0x35/0x40
[ 4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e
[ 4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP:
ffffaea1800ebd28
[ 4.211379] CR2:
ffffffffffffffe8
[ 4.211379] ---[ end trace
f31cc4a2e6df3689 ]---
[ 4.211379] Kernel panic - not syncing: Fatal exception in interrupt
[ 4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4.211379] Rebooting in 5 seconds..
Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org [4.9+]
Signed-off-by: Jorge E. Moreira <jemoreira@google.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 May 2019 23:33:06 +0000 (16:33 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2019-05-18
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix bpftool's raw BTF dump in relation to forward declarations of union/
structs, and another fix to unexport logging helpers, from Andrii.
2) Fix inode permission check for retrieving bpf programs, from Chenbo.
3) Fix bpftool to raise rlimit earlier as otherwise libbpf's feature probing
can fail and subsequently it refuses to load an object, from Yonghong.
4) Fix declaration of bpf_get_current_task() in kselftests, from Alexei.
5) Fix up BPF kselftest .gitignore to add generated files, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 May 2019 22:16:03 +0000 (15:16 -0700)]
Merge tag 'mlx5-fixes-2019-05-17' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-05-17
This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
Please pull and let me know if there is any problem.
For -stable v4.19
net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
net/mlx5: Imply MLXFW in mlx5_core
For -stable v5.0
net/mlx5e: Add missing ethtool driver info for representors
net/mlx5e: Additional check for flow destination comparison
For -stable v5.1
net/mlx5: Fix peer pf disable hca command
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Britstein [Sun, 12 May 2019 11:50:58 +0000 (11:50 +0000)]
net/mlx5e: Fix possible modify header actions memory leak
The cited commit could disable the modify header flag, but did not free
the allocated memory for the modify header actions. Fix it.
Fixes: 27c11b6b844cd ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Britstein [Wed, 10 Apr 2019 19:42:20 +0000 (19:42 +0000)]
net/mlx5e: Fix no rewrite fields with the same match
With commit
27c11b6b844c ("net/mlx5e: Do not rewrite fields with the
same match") there are no rewrites if the rewrite value is the same as
the matched value. However, if the field is not matched, the rewrite is
also wrongly skipped. Fix it.
Fixes: 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Dmytro Linkin [Thu, 2 May 2019 15:21:38 +0000 (15:21 +0000)]
net/mlx5e: Additional check for flow destination comparison
Flow destination comparison has an inaccuracy: code see no
difference between same vf ports, which belong to different pfs.
Example: If start ping from VF0 (PF1) to VF1 (PF1) and mirror
all traffic to VF0 (PF2), icmp reply to VF0 (PF1) and mirrored
flow to VF0 (PF2) would be determined as same destination. It lead
to creating flow handler with rule nodes, which not added to node
tree. When later driver try to delete this flow rules we got
kernel crash.
Add comparison of vhca_id field to avoid this.
Fixes: 1228e912c934 ("net/mlx5: Consider encapsulation properties when comparing destinations")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Dmytro Linkin [Thu, 25 Apr 2019 08:52:02 +0000 (08:52 +0000)]
net/mlx5e: Add missing ethtool driver info for representors
For all representors added firmware version info to show in
ethtool driver info.
For uplink representor, because only it is tied to the pci device
sysfs, added pci bus info.
Fixes: ff9b85de5d5d ("net/mlx5e: Add some ethtool port control entries to the uplink rep netdev")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Britstein [Mon, 13 May 2019 09:06:02 +0000 (09:06 +0000)]
net/mlx5e: Fix number of vports for ingress ACL configuration
With the cited commit, ACLs are configured for the VF ports. The loop
for the number of ports had the wrong number. Fix it.
Fixes: 184867373d8c ("net/mlx5e: ACLs for priority tag mode")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Tue, 7 May 2019 19:59:38 +0000 (12:59 -0700)]
net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
ethtool user spaces needs to know ring count via ETHTOOL_GRXRINGS when
executing (ethtool -x) which is retrieved via ethtool get_rxnfc callback,
in mlx5 this callback is disabled when CONFIG_MLX5_EN_RXNFC=n.
This patch allows only ETHTOOL_GRXRINGS command on mlx5e_get_rxnfc() when
CONFIG_MLX5_EN_RXNFC is disabled, so ethtool -x will continue working.
Fixes: fe6d86b3c316 ("net/mlx5e: Add CONFIG_MLX5_EN_RXNFC for ethtool rx nfc")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 15 May 2019 12:57:13 +0000 (15:57 +0300)]
net/mlx5e: Fix wrong xmit_more application
Cited patch refactored the xmit_more indication while not preserving
its functionality. Fix it.
Fixes: 3c31ff22b25f ("drivers: mellanox: use netdev_xmit_more() helper")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Bodong Wang [Mon, 29 Apr 2019 14:56:18 +0000 (09:56 -0500)]
net/mlx5: Fix peer pf disable hca command
The command was mistakenly using enable_hca in embedded CPU field.
Fixes: 22e939a91dcb (net/mlx5: Update enable HCA dependency)
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Alex Rosenbaum <alexr@mellanox.com>
Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Fri, 5 Apr 2019 06:07:19 +0000 (01:07 -0500)]
net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.
vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.
vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.
Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.
Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.
Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Valentine Fatiev [Wed, 1 May 2019 08:46:05 +0000 (11:46 +0300)]
net/mlx5: Add meaningful return codes to status_to_err function
Current version of function status_to_err return -1 for any
status returned by mlx5_cmd_invoke function. In case status is
MLX5_DRIVER_STATUS_ABORTED we should return 0 to the caller as we
assume command completed successfully on FW. If error returned we are
getting confusing messages in dmesg. In addition, currently returned
value -1 is confusing with -EPERM.
New implementation actually fix original commit and return meaningful
codes for commands delivery status and print message in case of failure.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Tue, 7 May 2019 20:15:20 +0000 (13:15 -0700)]
net/mlx5: Imply MLXFW in mlx5_core
mlxfw can be compiled as external module while mlx5_core can be
builtin, in such case mlx5 will act like mlxfw is disabled.
Since mlxfw is just a service library for mlx* drivers,
imply it in mlx5_core to make it always reachable if it was enabled.
Fixes: 3ffaabecd1a1 ("net/mlx5e: Support the flash device ethtool callback")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Fri, 17 May 2019 19:15:05 +0000 (12:15 -0700)]
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
This reverts commit
532b0f7ece4cb2ffd24dc723ddf55242d1188e5e.
More revisions coming up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Fri, 17 May 2019 14:45:43 +0000 (16:45 +0200)]
vsock/virtio: free packets during the socket release
When the socket is released, we should free all packets
queued in the per-socket list in order to avoid a memory
leak.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Junwei Hu [Fri, 17 May 2019 11:27:34 +0000 (19:27 +0800)]
tipc: fix modprobe tipc failed after switch order of device registration
Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit
7e27e8d6130c
("tipc: switch order of device registration to fix a crash")
Because sock_create_kern(net, AF_TIPC, ...) is called by
tipc_topsrv_create_listener() in the initialization process
of tipc_net_ops, tipc_socket_init() must be execute before that.
I move tipc_socket_init() into function tipc_init_net().
Fixes: 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Kang Zhou <zhoukang7@huawei.com>
Reviewed-by: Suanming Mou <mousuanming@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Mazenauer [Fri, 17 May 2019 10:44:44 +0000 (10:44 +0000)]
lib: Correct comment of prandom_seed
Variable 'entropy' was wrongly documented as 'seed', changed comment to
reflect actual variable name.
../lib/random32.c:179: warning: Function parameter or member 'entropy' not described in 'prandom_seed'
../lib/random32.c:179: warning: Excess function parameter 'seed' description in 'prandom_seed'
Signed-off-by: Philippe Mazenauer <philippe.mazenauer@outlook.de>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
swkhack [Fri, 17 May 2019 07:59:22 +0000 (15:59 +0800)]
net: caif: fix the value of size argument of snprintf
Because the function snprintf write at most size bytes(including the
null byte).So the value of the argument size need not to minus one.
Signed-off-by: swkhack <swkhack@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Fri, 17 May 2019 06:21:29 +0000 (23:21 -0700)]
bpftool: fix BTF raw dump of FWD's fwd_kind
kflag bit determines whether FWD is for struct or union. Use that bit.
Fixes: c93cc69004df ("bpftool: add ability to dump BTF types")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Fri, 17 May 2019 04:34:11 +0000 (21:34 -0700)]
selftests/bpf: fix bpf_get_current_task
Fix bpf_get_current_task() declaration.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Wei Wang [Thu, 16 May 2019 20:30:54 +0000 (13:30 -0700)]
ipv6: fix src addr routing with the exception table
When inserting route cache into the exception table, the key is
generated with both src_addr and dest_addr with src addr routing.
However, current logic always assumes the src_addr used to generate the
key is a /128 host address. This is not true in the following scenarios:
1. When the route is a gateway route or does not have next hop.
(rt6_is_gw_or_nonexthop() == false)
2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
This means, when looking for a route cache in the exception table, we
have to do the lookup twice: first time with the passed in /128 host
address, second time with the src_addr stored in fib6_info.
This solves the pmtu discovery issue reported by Mikael Magnusson where
a route cache with a lower mtu info is created for a gateway route with
src addr. However, the lookup code is not able to find this route cache.
Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se>
Bisected-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 16 May 2019 17:41:31 +0000 (10:41 -0700)]
selftests: pmtu.sh: Remove quotes around commands in setup_xfrm
The first command in setup_xfrm is failing resulting in the test getting
skipped:
+ ip netns exec ns-B ip -6 xfrm state add src fd00:1::a dst fd00:1::b spi 0x1000 proto esp aead 'rfc4106(gcm(aes))' 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
+ out=RTNETLINK answers: Function not implemented
...
xfrm6 not supported
TEST: vti6: PMTU exceptions [SKIP]
xfrm4 not supported
TEST: vti4: PMTU exceptions [SKIP]
...
The setup command started failing when the run_cmd option was added.
Removing the quotes fixes the problem:
...
TEST: vti6: PMTU exceptions [ OK ]
TEST: vti4: PMTU exceptions [ OK ]
...
Fixes: 56490b623aa0 ("selftests: Add debugging options to pmtu.sh")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 May 2019 15:09:57 +0000 (08:09 -0700)]
net: avoid weird emergency message
When host is under high stress, it is very possible thread
running netdev_wait_allrefs() returns from msleep(250)
10 seconds late.
This leads to these messages in the syslog :
[...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0
If the device refcount is zero, the wait is over.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 May 2019 21:22:14 +0000 (14:22 -0700)]
Merge branch 'aqc111-revert-endianess-fixes-and-cleanup-mtu-logic'
Igor Russkikh says:
====================
aqc111: revert endianess fixes and cleanup mtu logic
This reverts no-op commits as it was discussed:
https://lore.kernel.org/netdev/
1557839644.11261.4.camel@suse.com/
First and second original patches are already dropped from stable,
No need to stable-queue the third patch as it has no functional impact,
just a logic cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 16 May 2019 14:52:25 +0000 (14:52 +0000)]
aqc111: cleanup mtu related logic
Original fix
b8b277525e9d was done under impression that invalid data
could be written for mtu configuration higher that 16334.
But the high limit will anyway be rejected my max_mtu check in caller.
Thus, make the code cleaner and allow it doing the configuration without
checking for maximum mtu value.
Fixes: b8b277525e9d ("aqc111: fix endianness issue in aqc111_change_mtu")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 16 May 2019 14:52:22 +0000 (14:52 +0000)]
Revert "aqc111: fix writing to the phy on BE"
This reverts commit
369b46e9fbcfa5136f2cb5f486c90e5f7fa92630.
The required temporary storage is already done inside of write32/16
helpers.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 16 May 2019 14:52:20 +0000 (14:52 +0000)]
Revert "aqc111: fix double endianness swap on BE"
This reverts commit
2cf672709beb005f6e90cb4edbed6f2218ba953e.
The required temporary storage is already done inside of write32/16
helpers.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 16 May 2019 09:28:16 +0000 (11:28 +0200)]
xfrm: ressurrect "Fix uninitialized memory read in _decode_session4"
This resurrects commit
8742dc86d0c7a9628
("xfrm4: Fix uninitialized memory read in _decode_session4"),
which got lost during a merge conflict resolution between ipsec-next
and net-next tree.
c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")
in ipsec-next moved the (buggy) _decode_session4 from
net/ipv4/xfrm4_policy.c to net/xfrm/xfrm_policy.c.
In mean time,
8742dc86d0c7a was applied to ipsec.git and fixed the
problem in the "old" location.
When the trees got merged, the moved, old function was kept.
This applies the "lost" commit again, to the new location.
Fixes: a658a3f2ecbab ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Thu, 16 May 2019 03:39:27 +0000 (20:39 -0700)]
libbpf: move logging helpers into libbpf_internal.h
libbpf_util.h header was recently exposed as public as a dependency of
xsk.h. In addition to memory barriers, it contained logging helpers,
which are not supposed to be exposed. This patch moves those into
libbpf_internal.h, which is kept as an internal header.
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 7080da890984 ("libbpf: add libbpf_util.h to header install.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Junwei Hu [Thu, 16 May 2019 02:51:15 +0000 (10:51 +0800)]
tipc: switch order of device registration to fix a crash
When tipc is loaded while many processes try to create a TIPC socket,
a crash occurs:
PANIC: Unable to handle kernel paging request at virtual
address "
dfff20000000021d"
pc : tipc_sk_create+0x374/0x1180 [tipc]
lr : tipc_sk_create+0x374/0x1180 [tipc]
Exception class = DABT (current EL), IL = 32 bits
Call trace:
tipc_sk_create+0x374/0x1180 [tipc]
__sock_create+0x1cc/0x408
__sys_socket+0xec/0x1f0
__arm64_sys_socket+0x74/0xa8
...
This is due to race between sock_create and unfinished
register_pernet_device. tipc_sk_insert tries to do
"net_generic(net, tipc_net_id)".
but tipc_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with multiple processes doing socket(AF_TIPC, ...)
and one process doing module removal.
Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Xiaogang Wang <wangxiaogang3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 May 2019 02:39:52 +0000 (19:39 -0700)]
ipv6: prevent possible fib6 leaks
At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
for finding all percpu routes and set their ->from pointer
to NULL, so that fib6_ref can reach its expected value (1).
The problem right now is that other cpus can still catch the
route being deleted, since there is no rcu grace period
between the route deletion and call to fib6_drop_pcpu_from()
This can leak the fib6 and associated resources, since no
notifier will take care of removing the last reference(s).
I decided to add another boolean (fib6_destroying) instead
of reusing/renaming exception_bucket_flushed to ease stable backports,
and properly document the memory barriers used to implement this fix.
This patch has been co-developped with Wei Wang.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Wei Wang <weiwan@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 15 May 2019 17:29:16 +0000 (13:29 -0400)]
net: test nouarg before dereferencing zerocopy pointers
Zerocopy skbs without completion notification were added for packet
sockets with PACKET_TX_RING user buffers. Those signal completion
through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
added only to avoid premature notification after clone or orphan, by
triggering a copy on these paths for these packets.
The mechanism had to define a special "no-uarg" mode because packet
sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
for a different pointer.
Before deferencing skb_uarg(skb), verify that it is a real pointer.
Fixes: 5cd8d46ea1562 ("packet: copy user buffers before orphan or clone")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniele Palmas [Wed, 15 May 2019 15:29:43 +0000 (17:29 +0200)]
net: usb: qmi_wwan: add Telit 0x1260 and 0x1261 compositions
Added support for Telit LE910Cx 0x1260 and 0x1261 compositions.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin-cristian Bucur [Wed, 15 May 2019 15:07:44 +0000 (15:07 +0000)]
net: phy: aquantia: readd XGMII support for AQR107
XGMII interface mode no longer works on AQR107 after the recent changes,
adding back support.
Fixes: 570c8a7d5303 ("net: phy: aquantia: check for supported interface modes in config_init")
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Konstantin Khlebnikov [Wed, 15 May 2019 11:40:52 +0000 (14:40 +0300)]
net: bpfilter: fallback to netfilter if failed to load bpfilter kernel module
If bpfilter is not available return ENOPROTOOPT to fallback to netfilter.
Function request_module() returns both errors and userspace exit codes.
Just ignore them. Rechecking bpfilter_ops is enough.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Muthuswamy [Wed, 15 May 2019 00:56:05 +0000 (00:56 +0000)]
hv_sock: Add support for delayed close
Currently, hvsock does not implement any delayed or background close
logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and
the last reference to the socket is dropped, which leads to a call to
.destruct where the socket can hang indefinitely waiting for the peer to
close it's side. The can cause the user application to hang in the close()
call.
This change implements proper STREAM(TCP) closing handshake mechanism by
sending the FIN to the peer and the waiting for the peer's FIN to arrive
for a given timeout. On timeout, it will try to terminate the connection
(i.e. a RST). This is in-line with other socket providers such as virtio.
This change does not address the hang in the vmbus_hvsock_device_unregister
where it waits indefinitely for the host to rescind the channel. That
should be taken up as a separate fix.
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuqian Huang [Wed, 15 May 2019 00:42:48 +0000 (08:42 +0800)]
atm: iphase: Avoid copying pointers to user space.
Remove the MEMDUMP_DEV case in ia_ioctl to avoid copy
pointers to user space.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 May 2019 19:02:42 +0000 (12:02 -0700)]
Merge branch 'flow_offload-fix-CVLAN-support'
Edward Cree says:
====================
flow_offload: fix CVLAN support
When the flow_offload infrastructure was added, CVLAN matches weren't
plumbed through, and flow_rule_match_vlan() was incorrectly called in
the mlx5 driver when populating CVLAN match information. This series
adds flow_rule_match_cvlan(), and uses it in the mlx5 code.
Both patches should also go to 5.1 stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianbo Liu [Tue, 14 May 2019 20:18:50 +0000 (21:18 +0100)]
net/mlx5e: Fix calling wrong function to get inner vlan key and mask
When flow_rule_match_XYZ() functions were first introduced,
flow_rule_match_cvlan() for inner vlan is missing.
In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan()
is just called, which is wrong because it obtains outer vlan information by
FLOW_DISSECTOR_KEY_VLAN.
This commit fixes this by changing to call flow_rule_match_cvlan() after
it's added.
Fixes: 8f2566225ae2 ("flow_offload: add flow_rule and flow_match structures and use them")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 14 May 2019 20:18:12 +0000 (21:18 +0100)]
flow_offload: support CVLAN match
Plumb it through from the flow_dissector.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Thu, 16 May 2019 17:17:31 +0000 (10:17 -0700)]
tools/bpftool: move set_max_rlimit() before __bpf_object__open_xattr()
For a host which has a lower rlimit for max locked memory (e.g., 64KB),
the following error occurs in one of our production systems:
# /usr/sbin/bpftool prog load /paragon/pods/
52877437/home/mark.o \
/sys/fs/bpf/paragon_mark_21 type cgroup/skb \
map idx 0 pinned /sys/fs/bpf/paragon_map_21
libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
Error: failed to open object file
The reason is due to low locked memory during bpf_object__probe_name()
which probes whether program name is supported in kernel or not
during __bpf_object__open_xattr().
bpftool program load already tries to relax mlock rlimit before
bpf_object__load(). Let us move set_max_rlimit() before
__bpf_object__open_xattr(), which fixed the issue here.
Fixes: 47eff61777c7 ("bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stanislav Fomichev [Thu, 16 May 2019 16:46:57 +0000 (09:46 -0700)]
selftests/bpf: add test_sysctl and map_tests/tests.h to .gitignore
Missing files are:
* tools/testing/selftests/bpf/map_tests/tests.h - autogenerated
* tools/testing/selftests/bpf/test_sysctl - binary
Fixes: 51a0e301a563 ("bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps")
Fixes: 1f5fa9ab6e2e ("selftests/bpf: Test BPF_CGROUP_SYSCTL")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Chenbo Feng [Wed, 15 May 2019 02:42:57 +0000 (19:42 -0700)]
bpf: relax inode permission check for retrieving bpf program
For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
David S. Miller [Thu, 16 May 2019 16:45:20 +0000 (09:45 -0700)]
Merge branch 'rhashtable-Fix-sparse-warnings'
Herbert Xu says:
====================
rhashtable: Fix sparse warnings
This patch series fixes all the sparse warnings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 16 May 2019 07:19:48 +0000 (15:19 +0800)]
rhashtable: Fix cmpxchg RCU warnings
As cmpxchg is a non-RCU mechanism it will cause sparse warnings
when we use it for RCU. This patch adds explicit casts to silence
those warnings. This should probably be moved to RCU itself in
future.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 16 May 2019 07:19:46 +0000 (15:19 +0800)]
rhashtable: Remove RCU marking from rhash_lock_head
The opaque type rhash_lock_head should not be marked with __rcu
because it can never be dereferenced. We should apply the RCU
marking when we turn it into a pointer which can be dereferenced.
This patch does exactly that. This fixes a number of sparse
warnings as well as getting rid of some unnecessary RCU checking.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 May 2019 01:28:44 +0000 (18:28 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2019-05-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a use after free in __dev_map_entry_free(), from Eric.
2) Several sockmap related bug fixes: a splat in strparser if
it was never initialized, remove duplicate ingress msg list
purging which can race, fix msg->sg.size accounting upon
skb to msg conversion, and last but not least fix a timeout
bug in tcp_bpf_wait_data(), from John.
3) Fix LRU map to avoid messing with eviction heuristics upon
syscall lookup, e.g. map walks from user space side will
then lead to eviction of just recently created entries on
updates as it would mark all map entries, from Daniel.
4) Don't bail out when libbpf feature probing fails. Also
various smaller fixes to flow_dissector test, from Stanislav.
5) Fix missing brackets for BTF_INT_OFFSET() in UAPI, from Gary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Tue, 14 May 2019 04:42:03 +0000 (21:42 -0700)]
bpf, tcp: correctly handle DONT_WAIT flags and timeo == 0
The tcp_bpf_wait_data() routine needs to check timeo != 0 before
calling sk_wait_event() otherwise we may see unexpected stalls
on receiver.
Arika did all the leg work here I just formatted, posted and ran
a few tests.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: Arika Chen <eaglesora@gmail.com>
Suggested-by: Arika Chen <eaglesora@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 14 May 2019 21:12:34 +0000 (14:12 -0700)]
selftests/bpf: add prog detach to flow_dissector test
In case we are not running in a namespace (which we don't do by default),
let's try to detach the bpf program that we use for eth_get_headlen tests.
Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 14 May 2019 21:12:33 +0000 (14:12 -0700)]
selftests/bpf: add missing \n to flow_dissector CHECK errors
Otherwise, in case of an error, everything gets mushed together.
Fixes: a5cb33464e53 ("selftests/bpf: make flow dissector tests more extensible")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Wed, 15 May 2019 03:38:49 +0000 (20:38 -0700)]
libbpf: don't fail when feature probing fails
Otherwise libbpf is unusable from unprivileged process with
kernel.kernel.unprivileged_bpf_disabled=1.
All I get is EPERM from the probes, even if I just want to
open an ELF object and look at what progs/maps it has.
Instead of dying on probes, let's just pr_debug the error and
try to continue.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Eric Dumazet [Wed, 15 May 2019 16:10:15 +0000 (09:10 -0700)]
tcp: do not recycle cloned skbs
It is illegal to change arbitrary fields in skb_shared_info if the
skb is cloned.
Before calling skb_zcopy_clear() we need to ensure this rule,
therefore we need to move the test from sk_stream_alloc_skb()
to sk_wmem_free_skb()
Fixes: 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Wed, 15 May 2019 16:08:58 +0000 (19:08 +0300)]
enetc: Add missing link state info for ethtool
Just hook get_link to standard ethtool_op_get_link,
nothing special needed at this point.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Wed, 15 May 2019 16:08:57 +0000 (19:08 +0300)]
enetc: Allow to disable Tx SG
The fact that the Tx SG flag is fixed to 'on' is only
an oversight. Non-SG mode is also supported. Fix this
by allowing to turn SG off.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Wed, 15 May 2019 16:08:56 +0000 (19:08 +0300)]
enetc: Fix NULL dma address unmap for Tx BD extensions
For the unlikely case of TxBD extensions (i.e. ptp)
the driver tries to unmap the tx_swbd corresponding
to the extension, which is bogus as it has no buffer
attached.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Tue, 14 May 2019 21:28:19 +0000 (14:28 -0700)]
nfp: flower: add rcu locks when accessing netdev for tunnels
Add rcu locks when accessing netdev when processing route request
and tunnel keep alive messages received from hardware.
Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload")
Fixes: 856f5b135758 ("nfp: flower vxlan neighbour keep-alive")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 14 May 2019 14:55:32 +0000 (22:55 +0800)]
ppp: deflate: Fix possible crash in deflate_init
BUG: unable to handle kernel paging request at
ffffffffa018f000
PGD
3270067 P4D
3270067 PUD
3271063 PMD
2307eb067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 4138 Comm: modprobe Not tainted 5.1.0-rc7+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:ppp_register_compressor+0x3e/0xd0 [ppp_generic]
Code: 98 4a 3f e2 48 8b 15 c1 67 00 00 41 8b 0c 24 48 81 fa 40 f0 19 a0
75 0e eb 35 48 8b 12 48 81 fa 40 f0 19 a0 74
RSP: 0018:
ffffc90000d93c68 EFLAGS:
00010287
RAX:
ffffffffa018f000 RBX:
ffffffffa01a3000 RCX:
000000000000001a
RDX:
ffff888230c750a0 RSI:
0000000000000000 RDI:
ffffffffa019f000
RBP:
ffffc90000d93c80 R08:
0000000000000001 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffffffa0194080
R13:
ffff88822ee1a700 R14:
0000000000000000 R15:
ffffc90000d93e78
FS:
00007f2339557540(0000) GS:
ffff888237a00000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffffa018f000 CR3:
000000022bde4000 CR4:
00000000000006f0
Call Trace:
? 0xffffffffa01a3000
deflate_init+0x11/0x1000 [ppp_deflate]
? 0xffffffffa01a3000
do_one_initcall+0x6c/0x3cc
? kmem_cache_alloc_trace+0x248/0x3b0
do_init_module+0x5b/0x1f1
load_module+0x1db1/0x2690
? m_show+0x1d0/0x1d0
__do_sys_finit_module+0xc5/0xd0
__x64_sys_finit_module+0x15/0x20
do_syscall_64+0x6b/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
If ppp_deflate fails to register in deflate_init,
module initialization failed out, however
ppp_deflate_draft may has been regiestred and not
unregistered before return.
Then the seconed modprobe will trigger crash like this.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luca Ceresoli [Tue, 14 May 2019 13:23:07 +0000 (15:23 +0200)]
net: macb: fix error format in dev_err()
Errors are negative numbers. Using %u shows them as very large positive
numbers such as
4294967277 that don't make sense. Use the %d format
instead, and get a much nicer -19.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Fixes: b48e0bab142f ("net: macb: Migrate to devm clock interface")
Fixes: 93b31f48b3ba ("net/macb: unify clock management")
Fixes: 421d9df0628b ("net/macb: merge at91_ether driver into macb driver")
Fixes: aead88bd0e99 ("net: ethernet: macb: Add support for rx_clk")
Fixes: f5473d1d44e4 ("net: macb: Support clock management for tsu_clk")
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Tue, 14 May 2019 13:12:19 +0000 (15:12 +0200)]
rtnetlink: always put IFLA_LINK for links with a link-netnsid
Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when
iflink == ifindex.
In some cases, a device can be created in a different netns with the
same ifindex as its parent. That device will not dump its IFLA_LINK
attribute, which can confuse some userspace software that expects it.
For example, if the last ifindex created in init_net and foo are both
8, these commands will trigger the issue:
ip link add parent type dummy # ifindex 9
ip link add link parent netns foo type macvlan # ifindex 9 in ns foo
So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump,
always put the IFLA_LINK attribute as well.
Thanks to Dan Winship for analyzing the original OpenShift bug down to
the missing netlink attribute.
v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said
add Nicolas' ack
v3: change Fixes tag
fix subject typo, spotted by Edward Cree
Analyzed-by: Dan Winship <danw@redhat.com>
Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunjian Wang [Tue, 14 May 2019 11:03:19 +0000 (19:03 +0800)]
net/mlx4_core: Change the error print to info print
The error print within mlx4_flow_steer_promisc_add() should
be a info print.
Fixes: 592e49dda812 ('net/mlx4: Implement promiscuous mode with device managed flow-steering')
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 14 May 2019 09:02:31 +0000 (11:02 +0200)]
NFC: Orphan the subsystem
Samuel clearly hasn't been working on this in many years and
patches getting to the wireless list are just being ignored
entirely now. Mark the subsystem as orphan to reflect the
current state and revert back to the netdev list so at least
some fixes can be picked up by Dave.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 13 May 2019 21:06:24 +0000 (14:06 -0700)]
net: Always descend into dsa/
Jiri reported that with a kernel built with CONFIG_FIXED_PHY=y,
CONFIG_NET_DSA=m and CONFIG_NET_DSA_LOOP=m, we would not get to a
functional state where the mock-up driver is registered. Turns out that
we are not descending into drivers/net/dsa/ unconditionally, and we
won't be able to link-in dsa_loop_bdinfo.o which does the actual mock-up
mdio device registration.
Reported-by: Jiri Pirko <jiri@resnulli.us>
Fixes: 40013ff20b1b ("net: dsa: Fix functional dsa-loop dependency on FIXED_PHY")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Tested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Mon, 13 May 2019 17:32:05 +0000 (10:32 -0700)]
tcp: fix retrans timestamp on passive Fast Open
Commit
c7d13c8faa74 ("tcp: properly track retry time on
passive Fast Open") sets the start of SYNACK retransmission
time on passive Fast Open in "retrans_stamp". However the
timestamp is not reset upon the handshake has completed. As a
result, future data packet retransmission may not update it in
tcp_retransmit_skb(). This may lead to socket aborting earlier
unexpectedly by retransmits_timed_out() since retrans_stamp remains
the SYNACK rtx time.
This bug only manifests on passive TFO sender that a) suffered
SYNACK timeout and then b) stalls on very first loss recovery. Any
successful loss recovery would reset the timestamp to avoid this
issue.
Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 14 May 2019 21:12:59 +0000 (14:12 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- enable packed ring support for s390
- several fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: enable packed ring
virtio/s390: DMA support for virtio-ccw
virtio/s390: use vring_create_virtqueue
virtio/virtio_ring: do some comment fixes
vhost-scsi: remove incorrect memory barrier
tools/virtio/ringtest: Remove bogus definition of BUG_ON()
virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed
Linus Torvalds [Tue, 14 May 2019 20:30:10 +0000 (13:30 -0700)]
Add gitignore file for samples/vfs/ generated files
Commit
f1b5618e013a ("vfs: Add a sample program for the new mount API")
added sample programs that get built during the kernel build, but then
cause 'git status' to worry about whether the resulting binaries should
be managed by git.
Tell git not to worry, and to ignore the sample binaries.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 14 May 2019 20:17:19 +0000 (13:17 -0700)]
Merge branch 'parisc-5.2-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull more parisc updates from Helge Deller:
"Two small enhancements, which I didn't included in the last pull
request because I wanted to keep them a few more days in for-next
before sending upstream:
- Replace the ldcw barrier instruction by a nop instruction in the
CAS code on uniprocessor machines.
- Map variables read-only after init (enable ro_after_init feature)"
* 'parisc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Use __ro_after_init in init.c
parisc: Use __ro_after_init in unwind.c
parisc: Use __ro_after_init in time.c
parisc: Use __ro_after_init in processor.c
parisc: Use __ro_after_init in process.c
parisc: Use __ro_after_init in perf_images.h
parisc: Use __ro_after_init in pci.c
parisc: Use __ro_after_init in inventory.c
parisc: Use __ro_after_init in head.S
parisc: Use __ro_after_init in firmware.c
parisc: Use __ro_after_init in drivers.c
parisc: Use __ro_after_init in cache.c
parisc: Enable the ro_after_init feature
parisc: Drop LDCW barrier in CAS code when running UP
Linus Torvalds [Tue, 14 May 2019 20:14:10 +0000 (13:14 -0700)]
Merge tag 'kgdb-5.2-rc1' of git://git./linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Mostly cleanups but there are also a couple of fixes for out-of-bounds
accesses (including a potential write to the byte before a static
buffer).
The main changes are:
- Fixes to those out-of-bounds access (empty string to configure test
module could write the byte before a buffer, high cpu counts could
read outside of per-cpu structures).
- Improvements to string handling problems picked up by new compiler
warnings and other static checks. Most are fixing benign issues
that can't be tickled without code changes but still reduce the wtf
factor a little.
- Tidy up the terminal output"
* tag 'kgdb-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Fix bound check compiler warning
kdb: do a sanity check on the cpu in kdb_per_cpu()
kdb: Get rid of broken attempt to print CCVERSION in kdb summary
misc: kgdbts: fix out-of-bounds access in function param_set_kgdbts_var
kdb: kdb_support: replace strcpy() by strscpy()
gdbstub: Replace strcpy() by strscpy()
gdbstub: mark expected switch fall-throughs
Linus Torvalds [Tue, 14 May 2019 17:55:54 +0000 (10:55 -0700)]
Merge tag 'modules-for-v5.2' of git://git./linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
- Use a separate table to store symbol types instead of hijacking
fields in struct Elf_Sym
- Trivial code cleanups
* tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: add stubs for within_module functions
kallsyms: store type information in its own array
vmlinux.lds.h: drop unused __vermagic
Alexei Starovoitov [Tue, 14 May 2019 17:47:29 +0000 (10:47 -0700)]
Merge branch 'lru-map-fix'
Daniel Borkmann says:
====================
This set fixes LRU map eviction in combination with map lookups out
of system call side from user space. Main patch is the second one and
test cases are adapted and added in the last one. Thanks!
====================
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Mon, 13 May 2019 23:18:57 +0000 (01:18 +0200)]
bpf: test ref bit from data path and add new tests for syscall path
The test_lru_map is relying on marking the LRU map entry via regular
BPF map lookup from system call side. This is basically for simplicity
reasons. Given we fixed marking entries in that case, the test needs
to be fixed as well. Here we add a small drop-in replacement to retain
existing behavior for the tests by marking out of the BPF program and
transferring the retrieved value out via temporary map. This also adds
new test cases to track the new behavior where two elements are marked,
one via system call side and one via program side, where the next update
then evicts the key looked up only from system call side.
# ./test_lru_map
nr_cpus:8
test_lru_sanity0 (map_type:9 map_flags:0x0): Pass
test_lru_sanity1 (map_type:9 map_flags:0x0): Pass
test_lru_sanity2 (map_type:9 map_flags:0x0): Pass
test_lru_sanity3 (map_type:9 map_flags:0x0): Pass
test_lru_sanity4 (map_type:9 map_flags:0x0): Pass
test_lru_sanity5 (map_type:9 map_flags:0x0): Pass
test_lru_sanity7 (map_type:9 map_flags:0x0): Pass
test_lru_sanity8 (map_type:9 map_flags:0x0): Pass
test_lru_sanity0 (map_type:10 map_flags:0x0): Pass
test_lru_sanity1 (map_type:10 map_flags:0x0): Pass
test_lru_sanity2 (map_type:10 map_flags:0x0): Pass
test_lru_sanity3 (map_type:10 map_flags:0x0): Pass
test_lru_sanity4 (map_type:10 map_flags:0x0): Pass
test_lru_sanity5 (map_type:10 map_flags:0x0): Pass
test_lru_sanity7 (map_type:10 map_flags:0x0): Pass
test_lru_sanity8 (map_type:10 map_flags:0x0): Pass
test_lru_sanity0 (map_type:9 map_flags:0x2): Pass
test_lru_sanity4 (map_type:9 map_flags:0x2): Pass
test_lru_sanity6 (map_type:9 map_flags:0x2): Pass
test_lru_sanity7 (map_type:9 map_flags:0x2): Pass
test_lru_sanity8 (map_type:9 map_flags:0x2): Pass
test_lru_sanity0 (map_type:10 map_flags:0x2): Pass
test_lru_sanity4 (map_type:10 map_flags:0x2): Pass
test_lru_sanity6 (map_type:10 map_flags:0x2): Pass
test_lru_sanity7 (map_type:10 map_flags:0x2): Pass
test_lru_sanity8 (map_type:10 map_flags:0x2): Pass
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Mon, 13 May 2019 23:18:56 +0000 (01:18 +0200)]
bpf, lru: avoid messing with eviction heuristics upon syscall lookup
One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".
In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.
[0] http://vger.kernel.org/bpfconf.html
Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Mon, 13 May 2019 23:18:55 +0000 (01:18 +0200)]
bpf: add map_lookup_elem_sys_only for lookups from syscall side
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Tue, 14 May 2019 17:45:03 +0000 (10:45 -0700)]
Merge tag 'backlight-next-5.2' of git://git./linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Remove unused BACKLIGHT_LCD_SUPPORT symbol
- Remove unused BACKLIGHT_CLASS_DEVICE dependencies
- Add DT support to lm3630a_bl
Bug Fixes:
- Fix error path issues in lm3630a_bl"
* tag 'backlight-next-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: lm3630a: Add firmware node support
dt-bindings: backlight: Add lm3630a bindings
backlight: lm3630a: Return 0 on success in update_status functions
video: lcd: Remove useless BACKLIGHT_CLASS_DEVICE dependencies
video: backlight: Remove useless BACKLIGHT_LCD_SUPPORT kernel symbol
Linus Torvalds [Tue, 14 May 2019 17:39:08 +0000 (10:39 -0700)]
Merge tag 'mfd-next-5.2' of git://git./linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Core Framework:
- Document (kerneldoc) core mfd_add_devices() API
New Drivers:
- Altera SOCFPGA System Manager
- Maxim MAX77650/77651 PMIC
- Maxim MAX77663 PMIC
- ST Multi-Function eXpander (STMFX)
New Device Support:
- LEDs support in Intel Cherry Trail Whiskey Cove PMIC
- RTC support in SAMSUNG Electronics S2MPA01 PMIC
- SAM9X60 support in Atmel HLCDC (High-end LCD Controller)
- USB X-Powers AXP 8xx PMICs
- Integrated Sensor Hub (ISH) in ChromeOS EC
- USB PD Logger in ChromeOS EC
- AXP223 in X-Powers AXP series PMICs
- Power Supply in X-Powers AXP 803 PMICs
- Comet Lake in Intel Low Power Subsystem
- Fingerprint MCU in ChromeOS EC
- Touchpad MCU in ChromeOS EC
- Move TI LM3532 support to LED
New Functionality:
- max77650, max77620: Add/extend DT support
- max77620 power-off
- syscon clocking
- croc_ec host sleep event
Fix-ups:
- Trivial; Formatting, spelling, etc; Kconfig, sec-core, ab8500-debugfs
- Remove unused functionality; rk808, da9063-*
- SPDX conversion; da9063-*, atmel-*,
- Adapt/add new register definitions; cs47l35-tables, cs47l90-tables, imx6q-iomuxc-gpr
- Fix-up DT bindings; ti-lmu, cirrus,lochnagar
- Simply obtaining driver data; ssbi, t7l66xb, tc6387xb, tc6393xb
Bug Fixes:
- Fix incorrect defined values; max77620, da9063
- Fix device initialisation; twl6040
- Reset device on init; intel-lpss
- Fix build warnings when !OF; sun6i-prcm
- Register OF match tables; tps65912-spi
- Fix DMI matching; intel_quark_i2c_gpio"
* tag 'mfd-next-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (65 commits)
mfd: Use dev_get_drvdata() directly
mfd: cros_ec: Instantiate properly CrOS Touchpad MCU device
mfd: cros_ec: Instantiate properly CrOS FP MCU device
mfd: cros_ec: Update the EC feature codes
mfd: intel-lpss: Add Intel Comet Lake PCI IDs
mfd: lochnagar: Add links to binding docs for sound and hwmon
mfd: ab8500-debugfs: Fix a typo ("deubgfs")
mfd: imx6sx: Add MQS register definition for iomuxc gpr
dt-bindings: mfd: LMU: Fix lm3632 dt binding example
mfd: intel_quark_i2c_gpio: Adjust IOT2000 matching
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
mfd: tps65912-spi: Add missing of table registration
mfd: axp20x: Add USB power supply mfd cell to AXP803
mfd: sun6i-prcm: Fix build warning for non-OF configurations
mfd: intel-lpss: Set the device in reset state when init
platform/chrome: Add support for v1 of host sleep event
mfd: cros_ec: Add host_sleep_event_v1 command
mfd: cros_ec: Instantiate the CrOS USB PD logger driver
mfd: cs47l90: Make DAC_AEC_CONTROL_2 readable
mfd: cs47l35: Make DAC_AEC_CONTROL_2 readable
...
Linus Torvalds [Tue, 14 May 2019 17:30:10 +0000 (10:30 -0700)]
Merge tag 'pci-v5.2-changes' of git://git./linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration changes:
- Add _HPX Type 3 settings support, which gives firmware more
influence over device configuration (Alexandru Gagniuc)
- Support fixed bus numbers from bridge Enhanced Allocation
capabilities (Subbaraya Sundeep)
- Add "external-facing" DT property to identify cases where we
require IOMMU protection against untrusted devices (Jean-Philippe
Brucker)
- Enable PCIe services for host controller drivers that use managed
host bridge alloc (Jean-Philippe Brucker)
- Log PCIe port service messages with pci_dev, not the pcie_device
(Frederick Lawler)
- Convert pciehp from pciehp_debug module parameter to generic
dynamic debug (Frederick Lawler)
Peer-to-peer DMA:
- Add whitelist of Root Complexes that support peer-to-peer DMA
between Root Ports (Christian König)
Native controller drivers:
- Add PCI host bridge DMA ranges for bridges that can't DMA
everywhere, e.g., iProc (Srinath Mannam)
- Add Amazon Annapurna Labs PCIe host controller driver (Jonathan
Chocron)
- Fix Tegra MSI target allocation so DMA doesn't generate unwanted
MSIs (Vidya Sagar)
- Fix of_node reference leaks (Wen Yang)
- Fix Hyper-V module unload & device removal issues (Dexuan Cui)
- Cleanup R-Car driver (Marek Vasut)
- Cleanup Keystone driver (Kishon Vijay Abraham I)
- Cleanup i.MX6 driver (Andrey Smirnov)
Significant bug fixes:
- Reset Lenovo ThinkPad P50 GPU so nouveau works after reboot (Lyude
Paul)
- Fix Switchtec firmware update performance issue (Wesley Sheng)
- Work around Pericom switch link retraining erratum (Stefan Mätje)"
* tag 'pci-v5.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (141 commits)
MAINTAINERS: Add Karthikeyan Mitran and Hou Zhiqiang for Mobiveil PCI
PCI: pciehp: Remove pointless MY_NAME definition
PCI: pciehp: Remove pointless PCIE_MODULE_NAME definition
PCI: pciehp: Remove unused dbg/err/info/warn() wrappers
PCI: pciehp: Log messages with pci_dev, not pcie_device
PCI: pciehp: Replace pciehp_debug module param with dyndbg
PCI: pciehp: Remove pciehp_debug uses
PCI/AER: Log messages with pci_dev, not pcie_device
PCI/DPC: Log messages with pci_dev, not pcie_device
PCI/PME: Replace dev_printk(KERN_DEBUG) with dev_info()
PCI/AER: Replace dev_printk(KERN_DEBUG) with dev_info()
PCI: Replace dev_printk(KERN_DEBUG) with dev_info(), etc
PCI: Replace printk(KERN_INFO) with pr_info(), etc
PCI: Use dev_printk() when possible
PCI: Cleanup setup-bus.c comments and whitespace
PCI: imx6: Allow asynchronous probing
PCI: dwc: Save root bus for driver remove hooks
PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code
PCI: dwc: Free MSI in dw_pcie_host_init() error path
PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi()
...
Linus Torvalds [Tue, 14 May 2019 17:10:55 +0000 (10:10 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
- a few misc things and hotfixes
- ocfs2
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (139 commits)
kernel/memremap.c: remove the unused device_private_entry_fault() export
mm: delete find_get_entries_tag
mm/huge_memory.c: make __thp_get_unmapped_area static
mm/mprotect.c: fix compilation warning because of unused 'mm' variable
mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
mm/Kconfig: update "Memory Model" help text
mm/vmscan.c: don't disable irq again when count pgrefill for memcg
mm: memblock: make keeping memblock memory opt-in rather than opt-out
hugetlbfs: always use address space in inode for resv_map pointer
mm/z3fold.c: support page migration
mm/z3fold.c: add structure for buddy handles
mm/z3fold.c: improve compression by extending search
mm/z3fold.c: introduce helper functions
mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig
mm/vmscan.c: simplify shrink_inactive_list()
fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
xen/privcmd-buf.c: convert to use vm_map_pages_zero()
xen/gntdev.c: convert to use vm_map_pages()
...
Christoph Hellwig [Tue, 14 May 2019 00:23:23 +0000 (17:23 -0700)]
kernel/memremap.c: remove the unused device_private_entry_fault() export
This export has been entirely unused since it was added more than 1 1/2
years ago.
Link: http://lkml.kernel.org/r/20190429115535.12793-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox (Oracle) [Tue, 14 May 2019 00:23:20 +0000 (17:23 -0700)]
mm: delete find_get_entries_tag
I removed the only user of this and hadn't noticed it was now unused.
Link: http://lkml.kernel.org/r/20190430152929.21813-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bharath Vedartham [Tue, 14 May 2019 00:23:17 +0000 (17:23 -0700)]
mm/huge_memory.c: make __thp_get_unmapped_area static
__thp_get_unmapped_area is only used in mm/huge_memory.c. Make it static.
Tested by building and booting the kernel.
Link: http://lkml.kernel.org/r/20190504102353.GA22525@bharath12345-Inspiron-5559
Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 14 May 2019 00:23:14 +0000 (17:23 -0700)]
mm/mprotect.c: fix compilation warning because of unused 'mm' variable
Since
0cbe3e26abe0 ("mm: update ptep_modify_prot_start/commit to take
vm_area_struct as arg") the only place that uses the local 'mm' variable
in change_pte_range() is the call to set_pte_at().
Many architectures define set_pte_at() as macro that does not use the 'mm'
parameter, which generates the following compilation warning:
CC mm/mprotect.o
mm/mprotect.c: In function 'change_pte_range':
mm/mprotect.c:42:20: warning: unused variable 'mm' [-Wunused-variable]
struct mm_struct *mm = vma->vm_mm;
^~
Fix it by passing vma->mm to set_pte_at() and dropping the local 'mm'
variable in change_pte_range().
[liu.song.a23@gmail.com: fix missed conversions]
Link: http://lkml.kernel.org/r/CAPhsuW6wcQgYLHNdBdw6m0YiR4RWsS4XzfpSKU7wBLLeOCTbpw@mail.gmail.comLink:
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 14 May 2019 00:23:11 +0000 (17:23 -0700)]
mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
Recently there have been some hung tasks on our server due to
wait_on_page_writeback(), and we want to know the details of this
PG_writeback, i.e. this page is writing back to which device. But it is
not so convenient to get the details.
I think it would be better to introduce a tracepoint for diagnosing the
writeback details.
Link: http://lkml.kernel.org/r/1556274402-19018-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 14 May 2019 00:23:08 +0000 (17:23 -0700)]
mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
trace_reclaim_flags and trace_shrink_flags are almost the same.
We can simplify them to avoid redundant code.
Link: http://lkml.kernel.org/r/1556169203-5858-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 14 May 2019 00:23:05 +0000 (17:23 -0700)]
mm/Kconfig: update "Memory Model" help text
The help describing the memory model selection is outdated. It still says
that SPARSEMEM is experimental and DISCONTIGMEM is a preferred over
SPARSEMEM.
Update the help text for the relevant options:
* add a generic help for the "Memory Model" prompt
* add description for FLATMEM
* reduce the description of DISCONTIGMEM and add a deprecation note
* prefer SPARSEMEM over DISCONTIGMEM
Link: http://lkml.kernel.org/r/1556188531-20728-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 14 May 2019 00:23:02 +0000 (17:23 -0700)]
mm/vmscan.c: don't disable irq again when count pgrefill for memcg
We can use __count_memcg_events() directly because this callsite is alreay
protected by spin_lock_irq().
Link: http://lkml.kernel.org/r/1556093494-30798-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 14 May 2019 00:22:59 +0000 (17:22 -0700)]
mm: memblock: make keeping memblock memory opt-in rather than opt-out
Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.
Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.
Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Tue, 14 May 2019 00:22:55 +0000 (17:22 -0700)]
hugetlbfs: always use address space in inode for resv_map pointer
Continuing discussion about
58b6e5e8f1ad ("hugetlbfs: fix memory leak for
resv_map") brought up the issue that inode->i_mapping may not point to the
address space embedded within the inode at inode eviction time. The
hugetlbfs truncate routine handles this by explicitly using inode->i_data.
However, code cleaning up the resv_map will still use the address space
pointed to by inode->i_mapping. Luckily, private_data is NULL for address
spaces in all such cases today but, there is no guarantee this will
continue.
Change all hugetlbfs code getting a resv_map pointer to explicitly get it
from the address space embedded within the inode. In addition, add more
comments in the code to indicate why this is being done.
Link: http://lkml.kernel.org/r/20190419204435.16984-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Yufen Yu <yuyufen@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 14 May 2019 00:22:52 +0000 (17:22 -0700)]
mm/z3fold.c: support page migration
Now that we are not using page address in handles directly, we can make
z3fold pages movable to decrease the memory fragmentation z3fold may
create over time.
This patch starts advertising non-headless z3fold pages as movable and
uses the existing kernel infrastructure to implement moving of such pages
per memory management subsystem's request. It thus implements 3 required
callbacks for page migration:
* isolation callback: z3fold_page_isolate(): try to isolate the page by
removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise
* migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
memory subsystem. Returns 0 on success or negative error code otherwise
* putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).
[lkp@intel.com: z3fold_page_isolate() can be static]
Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 14 May 2019 00:22:49 +0000 (17:22 -0700)]
mm/z3fold.c: add structure for buddy handles
For z3fold to be able to move its pages per request of the memory
subsystem, it should not use direct object addresses in handles. Instead,
it will create abstract handles (3 per page) which will contain pointers
to z3fold objects. Thus, it will be possible to change these pointers
when z3fold page is moved.
Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 14 May 2019 00:22:46 +0000 (17:22 -0700)]
mm/z3fold.c: improve compression by extending search
The current z3fold implementation only searches this CPU's page lists for
a fitting page to put a new object into. This patch adds quick search for
very well fitting pages (i. e. those having exactly the required number
of free space) on other CPUs too, before allocating a new page for that
object.
Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Tue, 14 May 2019 00:22:43 +0000 (17:22 -0700)]
mm/z3fold.c: introduce helper functions
Patch series "z3fold: support page migration", v2.
This patchset implements page migration support and slightly better buddy
search. To implement page migration support, z3fold has to move away from
the current scheme of handle encoding. i. e. stop encoding page address
in handles. Instead, a small per-page structure is created which will
contain actual addresses for z3fold objects, while pointers to fields of
that structure will be used as handles.
Thus, it will be possible to change the underlying addresses to reflect
page migration.
To support migration itself, 3 callbacks will be implemented:
1: isolation callback: z3fold_page_isolate(): try to isolate the page
by removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise
2: migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
system. Returns 0 on success or negative error code otherwise
3: putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).
To make sure an isolated page doesn't get freed, its kref is incremented
in z3fold_page_isolate() and decremented during post-migration compaction,
if migration was successful, or by z3fold_page_putback() in the other
case.
Since the new handle encoding scheme implies slight memory consumption
increase, better buddy search (which decreases memory consumption) is
included in this patchset.
This patch (of 4):
Introduce a separate helper function for object allocation, as well as 2
smaller helpers to add a buddy to the list and to get a pointer to the
pool from the z3fold header. No functional changes here.
Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Tue, 14 May 2019 00:22:40 +0000 (17:22 -0700)]
mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
Because rmqueue_pcplist() is only called when order is 0, we don't need to
use order as a parameter.
Link: http://lkml.kernel.org/r/1555591709-11744-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>