Tim Hansen [Wed, 4 Oct 2017 19:59:49 +0000 (15:59 -0400)]
net/ipv4: Remove unused variable in route.c
int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.
This was found with coccicheck M=net/ipv4/ on linus' tree.
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Wed, 4 Oct 2017 17:04:04 +0000 (10:04 -0700)]
tcp: clean up TFO server's initial tcp_rearm_rto() call
This commit does a cleanup and moves tcp_rearm_rto() call in the TFO
server case into a previous spot in tcp_rcv_state_process() to make
it more compact.
This is only a cosmetic change.
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Wed, 4 Oct 2017 17:03:44 +0000 (10:03 -0700)]
tcp: uniform the set up of sockets after successful connection
Currently in the TCP code, the initialization sequence for cached
metrics, congestion control, BPF, etc, after successful connection
is very inconsistent. This introduces inconsistent bevhavior and is
prone to bugs. The current call sequence is as follows:
(1) for active case (tcp_finish_connect() case):
tcp_mtup_init(sk);
icsk->icsk_af_ops->rebuild_header(sk);
tcp_init_metrics(sk);
tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
tcp_init_buffer_space(sk);
(2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case):
icsk->icsk_af_ops->rebuild_header(sk);
tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
tcp_mtup_init(sk);
tcp_init_buffer_space(sk);
tcp_init_metrics(sk);
(3) for TFO passive case (tcp_fastopen_create_child()):
inet_csk(child)->icsk_af_ops->rebuild_header(child);
tcp_init_congestion_control(child);
tcp_mtup_init(child);
tcp_init_metrics(child);
tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
tcp_init_buffer_space(child);
This commit uniforms the above functions to have the following sequence:
tcp_mtup_init(sk);
icsk->icsk_af_ops->rebuild_header(sk);
tcp_init_metrics(sk);
tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
tcp_init_congestion_control(sk);
tcp_init_buffer_space(sk);
This sequence is the same as the (1) active case. We pick this sequence
because this order correctly allows BPF to override the settings
including congestion control module and initial cwnd, etc from
the route, and then allows the CC module to see those settings.
Suggested-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Oct 2017 01:44:18 +0000 (18:44 -0700)]
Merge branch 'VSOCK-sock_diag'
Stefan Hajnoczi says:
====================
VSOCK: add sock_diag interface
v3:
* Rebased onto net-next/master and resolved Hyper-V transport conflict
v2:
* Moved tests to tools/testing/vsock/. I was unable to put them in selftests/
because they require manual setup of a VMware/KVM guest.
* Moved to __vsock_in_bound/connected_table() to af_vsock.h
* Fixed local variable ordering in Patch 4
There is currently no way for userspace to query open AF_VSOCK sockets. This
means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.
This patch series adds the netlink sock_diag interface for AF_VSOCK. Userspace
programs sent a DUMP request including an sk_state bitmap to filter sockets
based on their state (connected, listening, etc). The vsock_diag.ko module
replies with information about matching sockets. This userspace ABI is defined
in <linux/vm_sockets_diag.h>.
The final patch adds a test suite that exercises the basic cases.
Jorgen and Dexuan: I have only tested the virtio transport but this should also
work for VMCI and Hyper-V. Please give it a shot if you have time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Thu, 5 Oct 2017 20:46:54 +0000 (16:46 -0400)]
VSOCK: add tools/testing/vsock/vsock_diag_test
This patch adds tests for the vsock_diag.ko module.
These tests are not self-tests because they require manual set up of a
KVM or VMware guest. Please see tools/testing/vsock/README for
instructions.
The control.h and timeout.h infrastructure can be used for additional
AF_VSOCK tests in the future.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Thu, 5 Oct 2017 20:46:53 +0000 (16:46 -0400)]
VSOCK: add sock_diag interface
This patch adds the sock_diag interface for querying sockets from
userspace. Tools like ss(8) and netstat(8) can use this interface to
list open sockets.
The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
netlink request and response structs. The request can query sockets
based on their sk_state (e.g. listening sockets only) and the response
contains socket information fields including the local/remote addresses,
inode number, etc.
This patch does not dump VMCI pending sockets because I have only tested
the virtio transport, which does not use pending sockets. Support can
be added later by extending vsock_diag_dump() if needed by VMCI users.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Thu, 5 Oct 2017 20:46:52 +0000 (16:46 -0400)]
VSOCK: use TCP state constants for sk_state
There are two state fields: socket->state and sock->sk_state. The
socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
sock->sk_state typically uses values that match TCP state constants
(TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention
and instead uses SS_* constants for both fields.
The sk_state field will be exposed to userspace through the vsock_diag
interface for ss(8), netstat(8), and other programs.
This patch switches sk_state to TCP state constants so that the meaning
of this field is consistent with other address families. Not just
AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
The following mapping was used to convert the code:
SS_FREE -> TCP_CLOSE
SS_UNCONNECTED -> TCP_CLOSE
SS_CONNECTING -> TCP_SYN_SENT
SS_CONNECTED -> TCP_ESTABLISHED
SS_DISCONNECTING -> TCP_CLOSING
VSOCK_SS_LISTEN -> TCP_LISTEN
In __vsock_create() the sk_state initialization was dropped because
sock_init_data() already initializes sk_state to TCP_CLOSE.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Thu, 5 Oct 2017 20:46:51 +0000 (16:46 -0400)]
VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h
The vsock_diag.ko module will need to check socket table membership.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi [Thu, 5 Oct 2017 20:46:50 +0000 (16:46 -0400)]
VSOCK: export socket tables for sock_diag interface
The socket table symbols need to be exported from vsock.ko so that the
vsock_diag.ko module will be able to traverse sockets.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Oct 2017 00:57:03 +0000 (17:57 -0700)]
Merge git://git./linux/kernel/git/davem/net
Just simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 5 Oct 2017 22:51:37 +0000 (15:51 -0700)]
Merge tag 'pm-4.14-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes a code ordering issue in the main suspend-to-idle loop that
causes some "low power S0 idle" conditions to be incorrectly reported
as unmet with suspend/resume debug messages enabled"
* tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / s2idle: Invoke the ->wake() platform callback earlier
Rafael J. Wysocki [Thu, 5 Oct 2017 22:24:14 +0000 (00:24 +0200)]
Merge branch 'pm-sleep'
* pm-sleep:
PM / s2idle: Invoke the ->wake() platform callback earlier
Linus Torvalds [Thu, 5 Oct 2017 22:17:40 +0000 (15:17 -0700)]
Merge tag 'for-4.14/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix for the alignment of the event number reported at the
end of the 'DM_LIST_DEVICES' ioctl.
- a couple stable fixes for the DM crypt target.
- a DM raid health status reporting fix.
* tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: fix incorrect status output at the end of a "recover" process
dm crypt: reject sector_size feature if device length is not aligned to it
dm crypt: fix memory leak in crypt_ctr_cipher_old()
dm ioctl: fix alignment of event number in the device list
Jonathan Brassow [Mon, 2 Oct 2017 22:17:35 +0000 (17:17 -0500)]
dm raid: fix incorrect status output at the end of a "recover" process
There are three important fields that indicate the overall health and
status of an array: dev_health, sync_ratio, and sync_action. They tell
us the condition of the devices in the array, and the degree to which
the array is synchronized.
This commit fixes a condition that is reported incorrectly. When a member
of the array is being rebuilt or a new device is added, the "recover"
process is used to synchronize it with the rest of the array. When the
process is complete, but the sync thread hasn't yet been reaped, it is
possible for the state of MD to be:
mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
curr_resync_completed = <max dev size> (but not MaxSector)
and all rdevs to be In_sync.
This causes the 'array_in_sync' output parameter that is passed to
rs_get_progress() to be computed incorrectly and reported as 'false' --
or not in-sync. This in turn causes the dev_health status characters to
be reported as all 'a', rather than the proper 'A'.
This can cause erroneous output for several seconds at a time when tools
will want to be checking the condition due to events that are raised at
the end of a sync process. Fix this by properly calculating the
'array_in_sync' return parameter in rs_get_progress().
Also, remove an unnecessary intermediate 'recovery_cp' variable in
rs_get_progress().
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Thu, 5 Oct 2017 17:39:29 +0000 (10:39 -0700)]
Merge tag 'sound-4.14-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, mostly with stable ones:
- X32 ABI fix for PCM; likely not so many people suffer from it, but
still better to fix
- Two minor kernel warning fixes on USB audio devices spotted by
syzkaller
- Regression fix of echoaudio due to its inconsistent dimension
- Fix for HBR support on Intel DP audio, on some recent chips
- USB-audio quirk for yet another Plantronics devices
- Fix for potential double-fetch in ASIHPI FIFO queue"
* tag 'sound-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usx2y: Suppress kernel warning at page allocation failures
Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
ALSA: pcm: Fix structure definition for X32 ABI
ALSA: usb-audio: Add sample rate quirk for Plantronics C310/C520-M
ALSA: hda - program ICT bits to support HBR audio
ALSA: asihpi: fix a potential double-fetch bug when copying puhm
ALSA: compress: Remove unused variable
Linus Torvalds [Thu, 5 Oct 2017 17:28:12 +0000 (10:28 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID subsystem fixes from Jiri Kosina:
- buffer management size fix for i2c-hid driver, from Adrian Salido
- tool ID regression fixes for Wacom driver from Jason Gerecke
- a few small assorted fixes and a few device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
Revert "HID: multitouch: Support ALPS PTP stick with pid 0x120A"
HID: hidraw: fix power sequence when closing device
HID: wacom: Always increment hdev refcount within wacom_get_hdev_data
HID: wacom: generic: Clear ABS_MISC when tool leaves proximity
HID: wacom: generic: Send MSC_SERIAL and ABS_MISC when leaving prox
HID: i2c-hid: allocate hid buffers for real worst case
HID: rmi: Make sure the HID device is opened on resume
HID: multitouch: Support ALPS PTP stick with pid 0x120A
HID: multitouch: support buttons and trackpoint on Lenovo X1 Tab Gen2
HID: wacom: Correct coordinate system of touchring and pen twist
HID: wacom: Properly report negative values from Intuos Pro 2 Bluetooth
HID: multitouch: Fix system-control buttons not working
HID: add multi-input quirk for IDC6680 touchscreen
HID: wacom: leds: Don't try to control the EKR's read-only LEDs
HID: wacom: bits shifted too much for 9th and 10th buttons
Linus Torvalds [Thu, 5 Oct 2017 15:40:09 +0000 (08:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Check iwlwifi 9000 reorder buffer out-of-space condition properly,
from Sara Sharon.
2) Fix RCU splat in qualcomm rmnet driver, from Subash Abhinov
Kasiviswanathan.
3) Fix session and tunnel release races in l2tp, from Guillaume Nault
and Sabrina Dubroca.
4) Fix endian bug in sctp_diag_dump(), from Dan Carpenter.
5) Several mlx5 driver fixes from the Mellanox folks (max flow counters
cap check, invalid memory access in IPoIB support, etc.)
6) tun_get_user() should bail if skb->len is zero, from Alexander
Potapenko.
7) Fix RCU lookups in inetpeer, from Eric Dumazet.
8) Fix locking in packet_do_bund().
9) Handle cb->start() error properly in netlink dump code, from Jason
A. Donenfeld.
10) Handle multicast properly in UDP socket early demux code. From Paolo
Abeni.
11) Several erspan bug fixes in ip_gre, from Xin Long.
12) Fix use-after-free in socket filter code, in order to handle the
fact that listener lock is no longer taken during the three-way TCP
handshake. From Eric Dumazet.
13) Fix infoleak in RTM_GETSTATS, from Nikolay Aleksandrov.
14) Fix tail call generation in x86-64 BPF JIT, from Alexei Starovoitov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
net: 8021q: skip packets if the vlan is down
bpf: fix bpf_tail_call() x64 JIT
net: stmmac: dwmac-rk: Add RK3128 GMAC support
rndis_host: support Novatel Verizon USB730L
net: rtnetlink: fix info leak in RTM_GETSTATS call
socket, bpf: fix possible use after free
mlxsw: spectrum_router: Track RIF of IPIP next hops
mlxsw: spectrum_router: Move VRF refcounting
net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
net: mvpp2: Fix clock resource by adding an optional bus clock
r8152: add Linksys USB3GIGV1 id
l2tp: fix l2tp_eth module loading
ip_gre: erspan device should keep dst
ip_gre: set tunnel hlen properly in erspan_tunnel_init
ip_gre: check packet length and mtu correctly in erspan_xmit
ip_gre: get key from session_id correctly in erspan_rcv
tipc: use only positive error codes in messages
ppp: fix __percpu annotation
udp: perform source validation for mcast early demux
IPv4: early demux can return an error code
...
David S. Miller [Thu, 5 Oct 2017 04:46:22 +0000 (21:46 -0700)]
Merge branch 'bpftool'
Jakub Kicinski says:
====================
tools: add bpftool
This set adds bpftool to the tools/ directory. The first
patch renames tools/net to tools/bpf, the second one adds
the new code, while the third adds simple documentation.
v4:
- rename docs *.txt -> *.rst (Jesper).
v3:
- address Alexei's comments about output and docs.
v2:
- report names, map ids, load time, uid;
- add docs/man pages;
- general cleanups & fixes.
====================
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 5 Oct 2017 03:10:05 +0000 (20:10 -0700)]
tools: bpftool: add documentation
Add documentation for bpftool. Separate files for each subcommand.
Use rst format. Documentation is compiled into man pages using
rst2man.
Signed-off-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 5 Oct 2017 03:10:04 +0000 (20:10 -0700)]
tools: bpf: add bpftool
Add a simple tool for querying and updating BPF objects on the system.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 5 Oct 2017 03:10:03 +0000 (20:10 -0700)]
tools: rename tools/net directory to tools/bpf
We currently only have BPF tools in the tools/net directory.
We are about to add more BPF tools there, not necessarily
networking related, rename the directory and related Makefile
targets to bpf.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Oct 2017 04:39:34 +0000 (21:39 -0700)]
Merge branch 'enslavement-extack'
David Ahern says:
====================
net: Plumb extack error reporting to enslavements
Another round of extending extack error reporting, this time for
enslavements through ndo_add_slave and notifiers.
v2
- changed how the messages are added to bonding driver per Jiri's request
- fixed spectrum message for LAG overflow per Ido's comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:51 +0000 (17:48 -0700)]
mlxsw: spectrum: Add extack messages for enslave failures
mlxsw fails device enslavement for a number of reasons. Use the extack
facility to return an error message to the user stating why the enslave
is failing.
Messages are prefixed with "spectrum" so users know it is a constraint
imposed by the hardware driver. For example:
$ ip li add br0.11 link br0 type vlan id 11
$ ip li set swp11 master br0
Error: spectrum: Enslaving a port to a device that already has an upper device is not supported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:50 +0000 (17:48 -0700)]
net: bridge: Pass extack to down to netdev_master_upper_dev_link
Pass extack arg to br_add_if. Add messages for a couple of failures
and pass arg to netdev_master_upper_dev_link.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:49 +0000 (17:48 -0700)]
net: bonding: Add extack messages for some enslave failures
A number of bond_enslave errors are logged using the netdev_err API.
Return those messages to userspace via the extack facility.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:48 +0000 (17:48 -0700)]
net: vrf: Add extack messages for enslave errors
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:47 +0000 (17:48 -0700)]
net: Add extack to upper device linking
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:46 +0000 (17:48 -0700)]
net: Add extack to ndo_add_slave
Pass extack to do_set_master and down to ndo_add_slave
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 5 Oct 2017 00:48:45 +0000 (17:48 -0700)]
net: Add extack to netdev_notifier_info
Add netlink_ext_ack to netdev_notifier_info to allow notifier
handlers to return errors to userspace.
Clean up the initialization in dev.c such that extack is easily
added in subsequent patches where relevant. Specifically, remove
the init call in call_netdevice_notifiers_info and have callers
initalize on stack when info is declared.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vishakha Narvekar [Tue, 3 Oct 2017 20:13:29 +0000 (16:13 -0400)]
net: 8021q: skip packets if the vlan is down
If the vlan is down, free the packet instead of proceeding with other
processing, or counting it as received. If vlan interfaces are used
as slaves for bonding, with arp monitoring for connectivity, if the rx
counter is seen to be incrementing, then the bond device will not
observe that the interface is down.
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Vishakha Narvekar <Vishakha.Narvekar@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 3 Oct 2017 11:53:23 +0000 (13:53 +0200)]
dev: advertise the new nsid when the netns iface changes
x-netns interfaces are bound to two netns: the link netns and the upper
netns. Usually, this kind of interfaces is created in the link netns and
then moved to the upper netns. At the end, the interface is visible only
in the upper netns. The link nsid is advertised via netlink in the upper
netns, thus the user always knows where is the link part.
There is no such mechanism in the link netns. When the interface is moved
to another netns, the user cannot "follow" it.
This patch adds a new netlink attribute which helps to follow an interface
which moves to another netns. When the interface is unregistered, the new
nsid is advertised. If the interface is a x-netns interface (ie
rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed.
CC: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 5 Oct 2017 00:16:05 +0000 (17:16 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Our first batch of fixes this release cycle, unfortunately a bit
noisier than usual. Two major groups stand out:
- Some pinctril dts/dtsi changes for stm32 due to a new driver being
merged during the merge window, and this aligns the DT contents
between the old format and the new. This could arguably be moved to
the next merge window but it also seemed relatively harmless to
include now.
- Amlogic/meson had driver changes merged that required devicetree
changes to avoid functional/performance regressions. I've already
asked them to be more careful about this going forward, and making
sure drivers are compatible with older DTs when they make these
kind of changes. The platform is actively being upstreamed so
there's a few things in flight, we've seen this happen before and
sometimes it's hard to catch in time.
Besides that there is the usual mix of minor fixes"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (33 commits)
ARM: dts: stm32: use right pinctrl compatible for stm32f469
ARM: dts: stm32: Fix STMPE1600 binding on stm32429i-eval board
ARM: defconfig: update Gemini defconfig
ARM: defconfig: FRAMEBUFFER_CONSOLE can no longer be =m
arm64: dts: rockchip: add the grf clk for dw-mipi-dsi on rk3399
reset: Restrict RESET_HSDK to ARC_SOC_HSDK or COMPILE_TEST
ARM: dts: da850-evm: add serial and ethernet aliases
ARM: dts: am43xx-epos-evm: Remove extra CPSW EMAC entry
ARM: dts: am33xx: Add spi alias to match SOC schematics
ARM: OMAP2+: hsmmc: fix logic to call either omap_hsmmc_init or omap_hsmmc_late_init but not both
ARM: dts: dra7: Set a default parent to mcasp3_ahclkx_mux
ARM: OMAP2+: dra7xx: Set OPT_CLKS_IN_RESET flag for gpio1
ARM: dts: nokia n900: drop unneeded/undocumented parts of the dts
arm64: dts: rockchip: Correct MIPI DPHY PLL clock on rk3399
arm64: dt marvell: Fix AP806 system controller size
MAINTAINERS: add Macchiatobin maintainers entry
ARC: reset: remove the misleading v1 suffix all over
ARC: reset: add missing DT binding documentation for HSDKv1 reset driver
ARC: reset: Only build on archs that have IOMEM
ARM: at91: Replace uses of virt_to_phys with __pa_symbol
...
James Hogan [Wed, 4 Oct 2017 22:10:59 +0000 (23:10 +0100)]
Update James Hogan's email address
Update my imgtec.com and personal email address to my kernel.org one in
a few places as MIPS will soon no longer be part of Imagination
Technologies, and add mappings in .mailcap so get_maintainer.pl reports
the right address.
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Wed, 4 Oct 2017 23:05:06 +0000 (16:05 -0700)]
Merge branch 'bpf-cgroup-multi-prog'
Alexei Starovoitov says:
====================
bpf: muli prog support for cgroup-bpf
v1->v2:
- fixed accidentally swapped two lines which caused static_key not going to zero
- addressed Martin's feedback and changed prog_query to be consistent
with verifier output: return -enospc and fill supplied buffer instead
of just returning -enospc when buffer is too small to fit all prog_ids
v1:
cgroup-bpf use cases are getting more advanced and running only
one program per cgroup is no longer enough. Therefore introduce
support for attaching multiple programs per cgroup and running
a set of effective programs.
These patches introduces BPF_F_ALLOW_MULTI flag for BPF_PROG_ATTACH cmd.
The default is still NONE and behavior of BPF_F_ALLOW_OVERRIDE flag
is unchanged.
The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this cgroup.
Most of the logic is in patch 1. Even when cgroup doesn't have
any programs attached its set of effective program can be non-empty.
To quickly execute them and avoid penalizing cgroups without
any effective programs introduce 'struct bpf_prog_array'
which has an optimization for cgroups with zero effective programs.
Patch 2 introduces BPF_PROG_QUERY command for introspection
Patch 3 makes verifier more strict for cgroup-bpf program types.
Patch 4+ are tests.
More details in individual patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:28 +0000 (22:50 -0700)]
samples/bpf: use bpf_prog_query() interface
use BPF_PROG_QUERY command to strengthen test coverage
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:27 +0000 (22:50 -0700)]
libbpf: add support for BPF_PROG_QUERY
add support for BPF_PROG_QUERY command to libbpf
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:26 +0000 (22:50 -0700)]
libbpf: sync bpf.h
tools/include/uapi/linux/bpf.h got out of sync with actual kernel header.
Update it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:25 +0000 (22:50 -0700)]
samples/bpf: add multi-prog cgroup test case
create 5 cgroups, attach 6 progs and check that progs are executed as:
cgrp1 (MULTI progs A, B) ->
cgrp2 (OVERRIDE prog C) ->
cgrp3 (MULTI prog D) ->
cgrp4 (OVERRIDE prog E) ->
cgrp5 (NONE prog F)
the event in cgrp5 triggers execution of F,D,A,B in that order.
if prog F is detached, the execution is E,D,A,B
if prog F and D are detached, the execution is E,A,B
if prog F, E and D are detached, the execution is C,A,B
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:24 +0000 (22:50 -0700)]
libbpf: introduce bpf_prog_detach2()
introduce bpf_prog_detach2() that takes one more argument prog_fd
vs bpf_prog_detach() that takes only attach_fd and type.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:23 +0000 (22:50 -0700)]
bpf: enforce return code for cgroup-bpf programs
with addition of tnum logic the verifier got smart enough and
we can enforce return codes at program load time.
For now do so for cgroup-bpf program types.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:22 +0000 (22:50 -0700)]
bpf: introduce BPF_PROG_QUERY command
introduce BPF_PROG_QUERY command to retrieve a set of either
attached programs to given cgroup or a set of effective programs
that will execute for events within a cgroup
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 3 Oct 2017 05:50:21 +0000 (22:50 -0700)]
bpf: multi program support for cgroup+bpf
introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple
bpf programs to a cgroup.
The difference between three possible flags for BPF_PROG_ATTACH command:
- NONE(default): No further bpf programs allowed in the subtree.
- BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program,
the program in this cgroup yields to sub-cgroup program.
- BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program,
that cgroup program gets run in addition to the program in this cgroup.
NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't
change their behavior. It only clarifies the semantics in relation
to new flag.
Only one program is allowed to be attached to a cgroup with
NONE or BPF_F_ALLOW_OVERRIDE flag.
Multiple programs are allowed to be attached to a cgroup with
BPF_F_ALLOW_MULTI flag. They are executed in FIFO order
(those that were attached first, run first)
The programs of sub-cgroup are executed first, then programs of
this cgroup and then programs of parent cgroup.
All eligible programs are executed regardless of return code from
earlier programs.
To allow efficient execution of multiple programs attached to a cgroup
and to avoid penalizing cgroups without any programs attached
introduce 'struct bpf_prog_array' which is RCU protected array
of pointers to bpf programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
for cgroup bits
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Oct 2017 17:48:35 +0000 (10:48 -0700)]
net: cache skb_shinfo() in skb_try_coalesce()
Compiler does not really know that skb_shinfo(to|from) are constants
in skb_try_coalesce(), lets cache their values to shrink code.
We might even take care of skb_zcopy() calls later.
$ size net/core/skbuff.o.before net/core/skbuff.o
text data bss dec hex filename
40727 1298 0 42025 a429 net/core/skbuff.o.before
40631 1298 0 41929 a3c9 net/core/skbuff.o
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 4 Oct 2017 14:22:59 +0000 (16:22 +0200)]
selftests: rtnetlink: try concurrent change of ifalias
to make sure this is serialized correctly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 4 Oct 2017 13:58:49 +0000 (15:58 +0200)]
rtnetlink: remove __rtnl_af_unregister
switch the only caller to rtnl_af_unregister.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 4 Oct 2017 13:55:29 +0000 (15:55 +0200)]
rtnetlink: remove slave_validate callback
no users in the tree.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 4 Oct 2017 13:20:37 +0000 (14:20 +0100)]
cxgb4vf: make a couple of functions static
The functions t4vf_link_down_rc_str and t4vf_handle_get_port_info are
local to the source and do not need to be in global scope, so make
them static.
Cleans up sparse warnings:
symbol 't4vf_link_down_rc_str' was not declared. Should it be static?
symbol 't4vf_handle_get_port_info' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Wed, 4 Oct 2017 17:31:00 +0000 (10:31 -0700)]
Merge tag 'stm32-dt-fixes-for-v4.14' of git://git./linux/kernel/git/atorgue/stm32 into fixes
STM32 fixes for v4.14:
---------------------
-Fix STMPE1600 bindings for stm32429i-eval board
-Use right compatible for stm32f469 pinctrl. It implies to use
pinctrl dedicated files for F4 SoCs.
* tag 'stm32-dt-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
ARM: dts: stm32: use right pinctrl compatible for stm32f469
ARM: dts: stm32: Fix STMPE1600 binding on stm32429i-eval board
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 17:30:39 +0000 (10:30 -0700)]
Merge tag 'amlogic-dt64-3' of git://git./linux/kernel/git/khilman/linux-amlogic into fixes
Amlogic 64-bit DT updates for v4.14 (round 3)
- updates for new MMC driver features/fixes
- support high-speed modes
* tag 'amlogic-dt64-3' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
ARM64: dts: meson-gxbb: nanopi-k2: enable sdr104 mode
ARM64: dts: meson-gxbb: nanopi-k2: enable sdcard UHS modes
ARM64: dts: meson-gxbb: p20x: enable sdcard UHS modes
ARM64: dts: meson-gxl: libretech-cc: enable high speed modes
ARM64: dts: meson-gxl: libretech-cc: add card regulator settle times
ARM64: dts: meson-gxbb: nanopi-k2: add card regulator settle times
ARM64: dts: meson: add mmc clk gate pins
ARM64: dts: meson: remove cap-sd-highspeed from emmc nodes
ARM64: dts: meson-gx: Use correct mmc clock source 0
Signed-off-by: Olof Johansson <olof@lixom.net>
Florian Westphal [Wed, 4 Oct 2017 11:56:50 +0000 (13:56 +0200)]
net: core: fix kerneldoc comment
net/core/dev.c:1306: warning: No description found for parameter 'name'
net/core/dev.c:1306: warning: Excess function parameter 'alias' description in 'dev_get_alias'
Fixes: 6c5570016b97 ("net: core: decouple ifalias get/set from rtnl lock")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 4 Oct 2017 07:54:27 +0000 (09:54 +0200)]
ravb: RX checksum offload
Add support for RX checksum offload. This is enabled by default and
may be disabled and re-enabled using ethtool:
# ethtool -K eth0 rx off
# ethtool -K eth0 rx on
The RAVB provides a simple checksumming scheme which appears to be
completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
the L2 header is appended to packet data; this may be trivially read by the
driver and used to update the skb accordingly.
In terms of performance throughput is close to gigabit line-rate both with
and without RX checksum offload enabled. Perf output, however, appears to
indicate that significantly less time is spent in do_csum(). This is as
expected.
Test results with RX checksum offload enabled:
# /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
enable_enobufs failed: getprotobyname
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 937.54
Summary of output of perf report:
18.28% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
10.34% ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy
9.83% ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll
7.89% ksoftirqd/0 [kernel.kallsyms] [k] skb_put
4.01% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
3.37% netperf [kernel.kallsyms] [k] __arch_copy_to_user
3.17% swapper [kernel.kallsyms] [k] arch_cpu_idle
2.55% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter
2.04% ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_dcache_area
2.03% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irq
1.96% ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb
1.59% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.83
Test results without RX checksum offload enabled:
# /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
enable_enobufs failed: getprotobyname
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 940.20
Summary of output of perf report:
17.10% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
10.99% ksoftirqd/0 [kernel.kallsyms] [k] __pi_memcpy
8.87% ksoftirqd/0 [kernel.kallsyms] [k] ravb_poll
8.16% ksoftirqd/0 [kernel.kallsyms] [k] skb_put
7.42% ksoftirqd/0 [kernel.kallsyms] [k] do_csum
3.91% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
2.31% swapper [kernel.kallsyms] [k] arch_cpu_idle
2.16% ksoftirqd/0 [kernel.kallsyms] [k] __pi___inval_dcache_area
2.14% ksoftirqd/0 [kernel.kallsyms] [k] __netdev_alloc_skb
1.93% netperf [kernel.kallsyms] [k] __arch_copy_to_user
1.79% swapper [kernel.kallsyms] [k] tick_nohz_idle_enter
1.63% ksoftirqd/0 [kernel.kallsyms] [k] __slab_alloc.isra.83
Above results collected on an R-Car Gen 3 Salvator-X/r8a7796 ES1.0.
Also tested on a R-Car Gen 3 Salvator-X/r8a7795 ES1.0.
By inspection this also appears to be compatible with the ravb found
on R-Car Gen 2 SoCs, however, this patch is currently untested on such
hardware.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 4 Oct 2017 16:30:50 +0000 (09:30 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"A lot of stuff, sorry about that. A week on a beach, then a bunch of
time catching up then more time letting it bake in -next. Shan't do
that again!"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (51 commits)
include/linux/fs.h: fix comment about struct address_space
checkpatch: fix ignoring cover-letter logic
m32r: fix build failure
lib/ratelimit.c: use deferred printk() version
kernel/params.c: improve STANDARD_PARAM_DEF readability
kernel/params.c: fix an overflow in param_attr_show
kernel/params.c: fix the maximum length in param_get_string
mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
kernel/kcmp.c: drop branch leftover typo
memremap: add scheduling point to devm_memremap_pages
mm, page_alloc: add scheduling point to memmap_init_zone
mm, memory_hotplug: add scheduling point to __add_pages
lib/idr.c: fix comment for idr_replace()
mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
lib/lz4: make arrays static const, reduces object code size
exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
...
Linus Torvalds [Wed, 4 Oct 2017 16:21:58 +0000 (09:21 -0700)]
Merge branch 'fixes-v4.14-rc4' of git://git./linux/kernel/git/jmorris/linux-security
Pull smack fix from James Morris:
"It fixes a bug in xattr_getsecurity() where security_release_secctx()
was being called instead of kfree(), which leads to a memory leak in
the capabilities code. smack_inode_getsecurity is also fixed to behave
correctly when called from there"
* 'fixes-v4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
Linus Torvalds [Wed, 4 Oct 2017 15:34:01 +0000 (08:34 -0700)]
Merge tag 'trace-v4.14-rc1-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixlets from Steven Rostedt:
"Two updates:
- A memory fix with left over code from spliting out ftrace_ops and
function graph tracer, where the function graph tracer could reset
the trampoline pointer, leaving the old trampoline not to be freed
(memory leak).
- The update to Paul's patch that added the unnecessary READ_ONCE().
This removes the unnecessary READ_ONCE() instead of having to
rebase the branch to update the patch that added it"
* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
ftrace: Fix kmemleak in unregister_ftrace_graph
Milan Broz [Wed, 13 Sep 2017 13:45:56 +0000 (15:45 +0200)]
dm crypt: reject sector_size feature if device length is not aligned to it
If a crypt mapping uses optional sector_size feature, additional
restrictions to mapped device segment size must be applied in
constructor, otherwise the device activation will fail later.
Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Alexandre Torgue [Wed, 4 Oct 2017 13:34:48 +0000 (15:34 +0200)]
ARM: dts: stm32: use right pinctrl compatible for stm32f469
Currently, same stm32f429-pinctrl driver is used for stm32f429 and
stm32f469. As pin map is different between those 2 MCUs,
a stm32f469-pinctrl driver has been recently added.
This patch
-allows to use stm32f469-pinctrl driver for stm32f469 boards
-reworks stm32 devicetree files to fit with stm32f429 / stm32f469
In the same time it fixes an issue when only MACH_STM32F469 flag is
selected in menuconfig.
Fixes: d28bcd53fa90 ("ARM: stm32: Introduce MACH_STM32F469 flag")
Reported-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Alexandre Torgue [Wed, 4 Oct 2017 09:42:00 +0000 (11:42 +0200)]
ARM: dts: stm32: Fix STMPE1600 binding on stm32429i-eval board
To declare gpio interrupt line for STMPE1600, 2 possibilities are offered:
-use gpio binding (and then the gpiolib interface inside driver)
-use interrupt binding as each gpio-controller are also interrupt controller
on stm32f429.
In STMPE 1600 node both (gpio and interrupt) bindings are defined.
This patch fixes this issue and use only interrupt binding.
Fixes: c04b2e72af8d ("ARM: dts: stm32: Enable STMPE1600 gpio expander of STM32F429-EVAL board")
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Casey Schaufler [Tue, 19 Sep 2017 16:39:08 +0000 (09:39 -0700)]
lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
security_inode_getsecurity() provides the text string value
of a security attribute. It does not provide a "secctx".
The code in xattr_getsecurity() that calls security_inode_getsecurity()
and then calls security_release_secctx() happened to work because
SElinux and Smack treat the attribute and the secctx the same way.
It fails for cap_inode_getsecurity(), because that module has no
secctx that ever needs releasing. It turns out that Smack is the
one that's doing things wrong by not allocating memory when instructed
to do so by the "alloc" parameter.
The fix is simple enough. Change the security_release_secctx() to
kfree() because it isn't a secctx being returned by
security_inode_getsecurity(). Change Smack to allocate the string when
told to do so.
Note: this also fixes memory leaks for LSMs which implement
inode_getsecurity but not release_secctx, such as capabilities.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Ganesh Goudar [Tue, 3 Oct 2017 05:40:53 +0000 (11:10 +0530)]
cxgb4: add new T6 pci device id's
Add 0x6085 T6 device id.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Wed, 4 Oct 2017 01:15:58 +0000 (18:15 -0700)]
Merge tag 'omap-for-v4.14/fixes-rc3' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omaps for v4.14-rc cycle
Few minor fixes for omaps, mostly just boot time warning fixes:
- Drop undocumented camera binding that got merged during the merge window by
accident as I applied before Sakari's comments
- Fix soft reset warning for dra7 kexec boot for gpio1 as the optional clocks
need to be enabled for reset
- Fix dra7 kexec boot clock rate for McASP as the rate is no longer the default
rate after kexec
- Fix omap3 pandora MMC warning during boot
- Add am33xx SPI alias like we have on other SoCs
- Remove node for non-existing CPSW EMAC Ethernet on am43xx-epos-evm
* tag 'omap-for-v4.14/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am43xx-epos-evm: Remove extra CPSW EMAC entry
ARM: dts: am33xx: Add spi alias to match SOC schematics
ARM: OMAP2+: hsmmc: fix logic to call either omap_hsmmc_init or omap_hsmmc_late_init but not both
ARM: dts: dra7: Set a default parent to mcasp3_ahclkx_mux
ARM: OMAP2+: dra7xx: Set OPT_CLKS_IN_RESET flag for gpio1
ARM: dts: nokia n900: drop unneeded/undocumented parts of the dts
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 01:13:35 +0000 (18:13 -0700)]
Merge tag 'v4.14-rockchip-dts64fixes-1' of git://git./linux/kernel/git/mmind/linux-rockchip into fixes
Adding the operating points on rk3368 like they were did not end up well
for the boards as all of them are missing their cpu supplies, the OPPs
actually need to follow the <target min max> format as the regulator is
shared between both clusters and the one rk3368 board I have, somehow also
doesn't like the higher opps at all - all of which I only realized after
I brought my rk3368 board online again, after its bootloader broke.
So we revert that OPP addition for now.
And also two fixes for the mipi dsi controller on rk3399, which was
referencing a clock to high up in the clock-tree so that an intermediate
gate could be disabled inadvertently and also needs a clock for its area
in the general register files of the rk3399 soc.
* tag 'v4.14-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: add the grf clk for dw-mipi-dsi on rk3399
arm64: dts: rockchip: Correct MIPI DPHY PLL clock on rk3399
Revert "arm64: dts: rockchip: Add basic cpu frequencies for RK3368"
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 01:10:40 +0000 (18:10 -0700)]
Merge tag 'mvebu-fixes-4.14-1' of git://git.infradead.org/linux-mvebu into fixes
mvebu fixes for 4.14 (part 1)
Update MAINTAINERS for the Macchiatobin board (Armada 8K based)
Fix AP806 system controller size on Armada 7K/8K
* tag 'mvebu-fixes-4.14-1' of git://git.infradead.org/linux-mvebu:
arm64: dt marvell: Fix AP806 system controller size
MAINTAINERS: add Macchiatobin maintainers entry
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 01:10:12 +0000 (18:10 -0700)]
Merge tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux into fixes
Reset controller fixes for v4.14
- Remove misleading HSDK v1 suffix, as there is no v2 planned
- Add missing DT binding documentation for HSDK reset driver
- Fix HSDK reset driver dependencies
* tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux:
reset: Restrict RESET_HSDK to ARC_SOC_HSDK or COMPILE_TEST
ARC: reset: remove the misleading v1 suffix all over
ARC: reset: add missing DT binding documentation for HSDKv1 reset driver
ARC: reset: Only build on archs that have IOMEM
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 01:09:08 +0000 (18:09 -0700)]
Merge tag 'davinci-fixes-for-v4.14' of git://git./linux/kernel/git/nsekhar/linux-davinci into fixes
A fix for random ethernet mac address problem
on DA850 EVM.
* tag 'davinci-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: dts: da850-evm: add serial and ethernet aliases
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Wed, 4 Oct 2017 01:08:27 +0000 (18:08 -0700)]
Merge tag 'at91-fixes' of git://git./linux/kernel/git/nferre/linux-at91 into fixes
Fixes for 4.14:
- three DT fixes for the newly introduced sama5d27_som1_ek board
- one treewide modification that didn't touch this new PM code: we
synchronize now to be coherent with the other ARM platforms
* tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
ARM: at91: Replace uses of virt_to_phys with __pa_symbol
ARM: dts: at91: sama5d27_som1_ek: fix USB host vbus
ARM: dts: at91: sama5d27_som1_ek: fix typos
ARM: dts: at91: sama5d27_som1_ek: update pinmux/pinconf for LEDs and USB
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Walleij [Sun, 17 Sep 2017 14:26:18 +0000 (16:26 +0200)]
ARM: defconfig: update Gemini defconfig
This updates the Gemini defconfig with drivers merged
for v4.13 or v4.14:
- ATA driver is merged
- DMA driver is merged
- RTC driver gets selected from default Kconfig
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Arnd Bergmann [Mon, 18 Sep 2017 15:47:50 +0000 (17:47 +0200)]
ARM: defconfig: FRAMEBUFFER_CONSOLE can no longer be =m
It is no longer possible to load this at runtime, so let's
change the few remaining users to have it built-in all
the time.
arch/arm/configs/zeus_defconfig:115:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/viper_defconfig:116:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/pxa_defconfig:474:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
Reported-by: kernelci.org bot <bot@kernelci.org>
Fixes: 6104c37094e7 ("fbcon: Make fbcon a built-time depency for fbdev")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
Mike Rapoport [Tue, 3 Oct 2017 23:16:54 +0000 (16:16 -0700)]
include/linux/fs.h: fix comment about struct address_space
Before commit
9c5d760b8d22 ("mm: split gfp_mask and mapping flags into
separate fields") the private_* fields of struct adrress_space were
grouped together and using "ditto" in comments describing the last
fields was correct.
With introduction of gpf_mask between private_lock and private_list
"ditto" references the wrong description.
Fix it by using the elaborate description.
Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stafford Horne [Tue, 3 Oct 2017 23:16:51 +0000 (16:16 -0700)]
checkpatch: fix ignoring cover-letter logic
Currently running checkpatch on a directory with a cover-letter.patch
file reports the following error:
-----------------------------------------
patches/smp-v2/v2-0000-cover-letter.patch
-----------------------------------------
ERROR: Does not appear to be a unified-diff format patch
The logic to suppress the unified-diff check for cover letters is there
but is checking $file instead of $filename. Fix the variable to use the
correct one.
Link: http://lkml.kernel.org/r/20170909090406.31523-1-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 3 Oct 2017 23:16:49 +0000 (16:16 -0700)]
m32r: fix build failure
The allmodconfig build of m32r is failing with the error:
lib/mpi/mpih-div.o: In function 'mpihelp_divrem':
mpih-div.c:(.text+0x40): undefined reference to 'abort'
mpih-div.c:(.text+0x40): relocation truncated to fit:
R_M32R_26_PCREL_RELA against undefined symbol 'abort'
The function 'abort' was never defined for the m32r architecture.
Create 'abort' as is done in other arch like 'arm' and 'unicore32'.
Link: http://lkml.kernel.org/r/1506727220-6108-1-git-send-email-sudip.mukherjee@codethink.co.uk
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Tue, 3 Oct 2017 23:16:45 +0000 (16:16 -0700)]
lib/ratelimit.c: use deferred printk() version
printk_ratelimit() invokes ___ratelimit() which may invoke a normal
printk() (pr_warn() in this particular case) to warn about suppressed
output. Given that printk_ratelimit() may be called from anywhere, that
pr_warn() is dangerous - it may end up deadlocking the system. Fix
___ratelimit() by using deferred printk().
Sasha reported the following lockdep error:
: Unregister pv shared memory for cpu 8
: select_fallback_rq: 3 callbacks suppressed
: process 8583 (trinity-c78) no longer affine to cpu8
:
: ======================================================
: WARNING: possible circular locking dependency detected
: 4.14.0-rc2-next-
20170927+ #252 Not tainted
: ------------------------------------------------------
: migration/8/62 is trying to acquire lock:
: (&port_lock_key){-.-.}, at: serial8250_console_write()
:
: but task is already holding lock:
: (&rq->lock){-.-.}, at: sched_cpu_dying()
:
: which lock already depends on the new lock.
:
:
: the existing dependency chain (in reverse order) is:
:
: -> #3 (&rq->lock){-.-.}:
: __lock_acquire()
: lock_acquire()
: _raw_spin_lock()
: task_fork_fair()
: sched_fork()
: copy_process.part.31()
: _do_fork()
: kernel_thread()
: rest_init()
: start_kernel()
: x86_64_start_reservations()
: x86_64_start_kernel()
: verify_cpu()
:
: -> #2 (&p->pi_lock){-.-.}:
: __lock_acquire()
: lock_acquire()
: _raw_spin_lock_irqsave()
: try_to_wake_up()
: default_wake_function()
: woken_wake_function()
: __wake_up_common()
: __wake_up_common_lock()
: __wake_up()
: tty_wakeup()
: tty_port_default_wakeup()
: tty_port_tty_wakeup()
: uart_write_wakeup()
: serial8250_tx_chars()
: serial8250_handle_irq.part.25()
: serial8250_default_handle_irq()
: serial8250_interrupt()
: __handle_irq_event_percpu()
: handle_irq_event_percpu()
: handle_irq_event()
: handle_level_irq()
: handle_irq()
: do_IRQ()
: ret_from_intr()
: native_safe_halt()
: default_idle()
: arch_cpu_idle()
: default_idle_call()
: do_idle()
: cpu_startup_entry()
: rest_init()
: start_kernel()
: x86_64_start_reservations()
: x86_64_start_kernel()
: verify_cpu()
:
: -> #1 (&tty->write_wait){-.-.}:
: __lock_acquire()
: lock_acquire()
: _raw_spin_lock_irqsave()
: __wake_up_common_lock()
: __wake_up()
: tty_wakeup()
: tty_port_default_wakeup()
: tty_port_tty_wakeup()
: uart_write_wakeup()
: serial8250_tx_chars()
: serial8250_handle_irq.part.25()
: serial8250_default_handle_irq()
: serial8250_interrupt()
: __handle_irq_event_percpu()
: handle_irq_event_percpu()
: handle_irq_event()
: handle_level_irq()
: handle_irq()
: do_IRQ()
: ret_from_intr()
: native_safe_halt()
: default_idle()
: arch_cpu_idle()
: default_idle_call()
: do_idle()
: cpu_startup_entry()
: rest_init()
: start_kernel()
: x86_64_start_reservations()
: x86_64_start_kernel()
: verify_cpu()
:
: -> #0 (&port_lock_key){-.-.}:
: check_prev_add()
: __lock_acquire()
: lock_acquire()
: _raw_spin_lock_irqsave()
: serial8250_console_write()
: univ8250_console_write()
: console_unlock()
: vprintk_emit()
: vprintk_default()
: vprintk_func()
: printk()
: ___ratelimit()
: __printk_ratelimit()
: select_fallback_rq()
: sched_cpu_dying()
: cpuhp_invoke_callback()
: take_cpu_down()
: multi_cpu_stop()
: cpu_stopper_thread()
: smpboot_thread_fn()
: kthread()
: ret_from_fork()
:
: other info that might help us debug this:
:
: Chain exists of:
: &port_lock_key --> &p->pi_lock --> &rq->lock
:
: Possible unsafe locking scenario:
:
: CPU0 CPU1
: ---- ----
: lock(&rq->lock);
: lock(&p->pi_lock);
: lock(&rq->lock);
: lock(&port_lock_key);
:
: *** DEADLOCK ***
:
: 4 locks held by migration/8/62:
: #0: (&p->pi_lock){-.-.}, at: sched_cpu_dying()
: #1: (&rq->lock){-.-.}, at: sched_cpu_dying()
: #2: (printk_ratelimit_state.lock){....}, at: ___ratelimit()
: #3: (console_lock){+.+.}, at: vprintk_emit()
:
: stack backtrace:
: CPU: 8 PID: 62 Comm: migration/8 Not tainted 4.14.0-rc2-next-
20170927+ #252
: Call Trace:
: dump_stack()
: print_circular_bug()
: check_prev_add()
: ? add_lock_to_list.isra.26()
: ? check_usage()
: ? kvm_clock_read()
: ? kvm_sched_clock_read()
: ? sched_clock()
: ? check_preemption_disabled()
: __lock_acquire()
: ? __lock_acquire()
: ? add_lock_to_list.isra.26()
: ? debug_check_no_locks_freed()
: ? memcpy()
: lock_acquire()
: ? serial8250_console_write()
: _raw_spin_lock_irqsave()
: ? serial8250_console_write()
: serial8250_console_write()
: ? serial8250_start_tx()
: ? lock_acquire()
: ? memcpy()
: univ8250_console_write()
: console_unlock()
: ? __down_trylock_console_sem()
: vprintk_emit()
: vprintk_default()
: vprintk_func()
: printk()
: ? show_regs_print_info()
: ? lock_acquire()
: ___ratelimit()
: __printk_ratelimit()
: select_fallback_rq()
: sched_cpu_dying()
: ? sched_cpu_starting()
: ? rcutree_dying_cpu()
: ? sched_cpu_starting()
: cpuhp_invoke_callback()
: ? cpu_disable_common()
: take_cpu_down()
: ? trace_hardirqs_off_caller()
: ? cpuhp_invoke_callback()
: multi_cpu_stop()
: ? __this_cpu_preempt_check()
: ? cpu_stop_queue_work()
: cpu_stopper_thread()
: ? cpu_stop_create()
: smpboot_thread_fn()
: ? sort_range()
: ? schedule()
: ? __kthread_parkme()
: kthread()
: ? sort_range()
: ? kthread_create_on_node()
: ret_from_fork()
: process 9121 (trinity-c78) no longer affine to cpu8
: smpboot: CPU 8 is now offline
Link: http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
Fixes: 6b1d174b0c27b ("ratelimit: extend to print suppressed messages on release")
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Tue, 3 Oct 2017 23:16:41 +0000 (16:16 -0700)]
kernel/params.c: improve STANDARD_PARAM_DEF readability
Align the parameters passed to STANDARD_PARAM_DEF for clarity.
Link: http://lkml.kernel.org/r/20170928162728.756143cc@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Tue, 3 Oct 2017 23:16:38 +0000 (16:16 -0700)]
kernel/params.c: fix an overflow in param_attr_show
Function param_attr_show could overflow the buffer it is operating on.
The buffer size is PAGE_SIZE, and the string returned by
attribute->param->ops->get is generated by scnprintf(buffer, PAGE_SIZE,
...) so it could be PAGE_SIZE - 1 long, with the terminating '\0' at the
very end of the buffer. Calling strcat(..., "\n") on this isn't safe, as
the '\0' will be replaced by '\n' (OK) and then another '\0' will be added
past the end of the buffer (not OK.)
Simply add the trailing '\n' when writing the attribute contents to the
buffer originally. This is safe, and also faster.
Credits to Teradata for discovering this issue.
Link: http://lkml.kernel.org/r/20170928162602.60c379c7@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Tue, 3 Oct 2017 23:16:35 +0000 (16:16 -0700)]
kernel/params.c: fix the maximum length in param_get_string
The length parameter of strlcpy() is supposed to reflect the size of the
target buffer, not of the source string. Harmless in this case as the
buffer is PAGE_SIZE long and the source string is always much shorter than
this, but conceptually wrong, so let's fix it.
Link: http://lkml.kernel.org/r/20170928162515.24846b4f@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YASUAKI ISHIMATSU [Tue, 3 Oct 2017 23:16:32 +0000 (16:16 -0700)]
mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
find_{smallest|biggest}_section_pfn()s find the smallest/biggest section
and return the pfn of the section. But the functions are defined as int.
So the functions always return 0x00000000 - 0xffffffff. It means if
memory address is over 16TB, the functions does not work correctly.
To handle 64 bit value, the patch defines
find_{smallest|biggest}_section_pfn() as unsigned long.
Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory")
Link: http://lkml.kernel.org/r/d9d5593a-d0a4-c4be-ab08-493df59a85c6@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YASUAKI ISHIMATSU [Tue, 3 Oct 2017 23:16:29 +0000 (16:16 -0700)]
mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
pfn_to_section_nr() has no issue even if it is defined as macro. But
section_nr_to_pfn() has overflow issue if sec is defined as int.
section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is
defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
value.
__remove_section() calculates start_pfn using section_nr_to_pfn() and
scn_nr defined as int. So if hot-removed memory address is over 16TB,
overflow issue occurs and section_nr_to_pfn() does not calculate correct
pfn.
To make callers use proper arg, the patch changes the macros to inline
functions.
Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory")
Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Tue, 3 Oct 2017 23:16:26 +0000 (16:16 -0700)]
kernel/kcmp.c: drop branch leftover typo
The else branch been left over and escaped the source code refresh. Not
a problem but better clean it up.
Fixes: 0791e3644e5e ("kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files")
Link: http://lkml.kernel.org/r/20170917165838.GA1887@uranus.lan
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:16:23 +0000 (16:16 -0700)]
memremap: add scheduling point to devm_memremap_pages
devm_memremap_pages is initializing struct pages in for_each_device_pfn
and that can take quite some time. We have even seen a soft lockup
triggering on a non preemptive kernel
NMI watchdog: BUG: soft lockup - CPU#61 stuck for 22s! [kworker/u641:11:1808]
[...]
RIP: 0010:[<
ffffffff8118b6b7>] [<
ffffffff8118b6b7>] devm_memremap_pages+0x327/0x430
[...]
Call Trace:
pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
nvdimm_bus_probe+0x64/0x110 [libnvdimm]
driver_probe_device+0x1f7/0x420
bus_for_each_drv+0x52/0x80
__device_attach+0xb0/0x130
bus_probe_device+0x87/0xa0
device_add+0x3fc/0x5f0
nd_async_device_register+0xe/0x40 [libnvdimm]
async_run_entry_fn+0x43/0x150
process_one_work+0x14e/0x410
worker_thread+0x116/0x490
kthread+0xc7/0xe0
ret_from_fork+0x3f/0x70
fix this by adding cond_resched every 1024 pages.
Link: http://lkml.kernel.org/r/20170918121410.24466-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dan Williams <dan.j.williams@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:16:19 +0000 (16:16 -0700)]
mm, page_alloc: add scheduling point to memmap_init_zone
memmap_init_zone gets a pfn range to initialize and it can be really
large resulting in a soft lockup on non-preemptible kernels
NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720]
[...]
task:
ffff88ecd7e902c0 ti:
ffff88eca4e50000 task.ti:
ffff88eca4e50000
RIP: move_pfn_range_to_zone+0x185/0x1d0
[...]
Call Trace:
devm_memremap_pages+0x2c7/0x430
pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
nvdimm_bus_probe+0x64/0x110 [libnvdimm]
driver_probe_device+0x1f7/0x420
bus_for_each_drv+0x52/0x80
__device_attach+0xb0/0x130
bus_probe_device+0x87/0xa0
device_add+0x3fc/0x5f0
nd_async_device_register+0xe/0x40 [libnvdimm]
async_run_entry_fn+0x43/0x150
process_one_work+0x14e/0x410
worker_thread+0x116/0x490
kthread+0xc7/0xe0
ret_from_fork+0x3f/0x70
Fix this by adding a scheduling point once per page block.
Link: http://lkml.kernel.org/r/20170918121410.24466-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dan Williams <dan.j.williams@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Tue, 3 Oct 2017 23:16:16 +0000 (16:16 -0700)]
mm, memory_hotplug: add scheduling point to __add_pages
Patch series "mm, memory_hotplug: fix few soft lockups in memory
hotadd".
Johannes has noticed few soft lockups when adding a large nvdimm device.
All of them were caused by a long loop without any explicit cond_resched
which is a problem for !PREEMPT kernels.
The fix is quite straightforward. Just make sure that cond_resched gets
called from time to time.
This patch (of 3):
__add_pages gets a pfn range to add and there is no upper bound for a
single call. This is usually a memory block aligned size for the
regular memory hotplug - smaller sizes are usual for memory balloning
drivers, or the whole NUMA node for physical memory online. There is no
explicit scheduling point in that code path though.
This can lead to long latencies while __add_pages is executed and we
have even seen a soft lockup report during nvdimm initialization with
!PREEMPT kernel
NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [kworker/u641:3:832]
[...]
Workqueue: events_unbound async_run_entry_fn
task:
ffff881809270f40 ti:
ffff881809274000 task.ti:
ffff881809274000
RIP: _raw_spin_unlock_irqrestore+0x11/0x20
RSP: 0018:
ffff881809277b10 EFLAGS:
00000286
[...]
Call Trace:
sparse_add_one_section+0x13d/0x18e
__add_pages+0x10a/0x1d0
arch_add_memory+0x4a/0xc0
devm_memremap_pages+0x29d/0x430
pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
nvdimm_bus_probe+0x64/0x110 [libnvdimm]
driver_probe_device+0x1f7/0x420
bus_for_each_drv+0x52/0x80
__device_attach+0xb0/0x130
bus_probe_device+0x87/0xa0
device_add+0x3fc/0x5f0
nd_async_device_register+0xe/0x40 [libnvdimm]
async_run_entry_fn+0x43/0x150
process_one_work+0x14e/0x410
worker_thread+0x116/0x490
kthread+0xc7/0xe0
ret_from_fork+0x3f/0x70
DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70
Fix this by adding cond_resched once per each memory section in the
given pfn range. Each section is constant amount of work which itself
is not too expensive but many of them will just add up.
Link: http://lkml.kernel.org/r/20170918121410.24466-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dan Williams <dan.j.williams@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Biggers [Tue, 3 Oct 2017 23:16:13 +0000 (16:16 -0700)]
lib/idr.c: fix comment for idr_replace()
idr_replace() returns the old value on success, not 0.
Link: http://lkml.kernel.org/r/20170918162642.37511-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Tue, 3 Oct 2017 23:16:10 +0000 (16:16 -0700)]
mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
For quick per-memcg indexing, slab caches and list_lru structures
maintain linear arrays of descriptors. As the number of concurrent
memory cgroups in the system goes up, this requires large contiguous
allocations (8k cgroups = order-5, 16k cgroups = order-6 etc.) for every
existing slab cache and list_lru, which can easily fail on loaded
systems. E.g.:
mkdir: page allocation failure: order:5, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
CPU: 1 PID: 6399 Comm: mkdir Not tainted
4.13.0-mm1-00065-g720bbe532b7c-dirty #481
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
Call Trace:
? __alloc_pages_direct_compact+0x4c/0x110
__alloc_pages_nodemask+0xf50/0x1430
alloc_pages_current+0x60/0xc0
kmalloc_order_trace+0x29/0x1b0
__kmalloc+0x1f4/0x320
memcg_update_all_list_lrus+0xca/0x2e0
mem_cgroup_css_alloc+0x612/0x670
cgroup_apply_control_enable+0x19e/0x360
cgroup_mkdir+0x322/0x490
kernfs_iop_mkdir+0x55/0x80
vfs_mkdir+0xd0/0x120
SyS_mkdirat+0x6c/0xe0
SyS_mkdir+0x14/0x20
entry_SYSCALL_64_fastpath+0x18/0xad
Mem-Info:
active_anon:2965 inactive_anon:19 isolated_anon:0
active_file:100270 inactive_file:98846 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:7328 slab_unreclaimable:16402
mapped:771 shmem:52 pagetables:278 bounce:0
free:13718 free_pcp:0 free_cma:0
This output is from an artificial reproducer, but we have repeatedly
observed order-7 failures in production in the Facebook fleet. These
systems become useless as they cannot run more jobs, even though there
is plenty of memory to allocate 128 individual pages.
Use kvmalloc and kvzalloc to fall back to vmalloc space if these arrays
prove too large for allocating them physically contiguous.
Link: http://lkml.kernel.org/r/20170918184919.20644-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Tue, 3 Oct 2017 23:16:07 +0000 (16:16 -0700)]
kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
do_proc_douintvec_conv() has two UINT_MAX checks, we can remove one.
This has no functional changes other than fixing a compiler warning:
kernel/sysctl.c:2190]: (warning) Identical condition '*lvalp>UINT_MAX', second condition is always false
Fixes: 4f2fec00afa60 ("sysctl: simplify unsigned int support")
Link: http://lkml.kernel.org/r/20170919072918.12066-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Tue, 3 Oct 2017 23:16:04 +0000 (16:16 -0700)]
include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
I do not see anything that restricts this macro to 32 bit width.
Link: http://lkml.kernel.org/r/1505921975-23379-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Tue, 3 Oct 2017 23:16:01 +0000 (16:16 -0700)]
lib/lz4: make arrays static const, reduces object code size
Don't populate the read-only arrays dec32table and dec64table on the
stack, instead make them both static const. Makes the object code
smaller by over 10K bytes:
Before:
text data bss dec hex filename
31500 0 0 31500 7b0c lib/lz4/lz4_decompress.o
After:
text data bss dec hex filename
20237 176 0 20413 4fbd lib/lz4/lz4_decompress.o
(gcc version 7.2.0 x86_64)
Link: http://lkml.kernel.org/r/20170921221939.20820-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:58 +0000 (16:15 -0700)]
exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
After the previous change "fmt" can't go away, we can kill
iname/iname_addr and use fmt->interpreter.
Link: http://lkml.kernel.org/r/20170922143653.GA17232@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Woodard <woodard@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Travis Gummels <tgummels@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:55 +0000 (16:15 -0700)]
exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
load_misc_binary() makes a local copy of fmt->interpreter under
entries_lock to avoid the race with kill_node() but this is not enough;
the whole Node can be freed after we drop entries_lock, not only the
->interpreter string.
Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free
this Node.
Link: http://lkml.kernel.org/r/20170922143650.GA17227@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Woodard <woodard@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: Travis Gummels <tgummels@redhat.com>
Cc: <tdhooge@llnl.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:51 +0000 (16:15 -0700)]
exec: binfmt_misc: remove the confusing e->interp_file != NULL checks
If MISC_FMT_OPEN_FILE flag is set e->interp_file must be valid or we
have a bug which should not be silently ignored.
Link: http://lkml.kernel.org/r/20170922143647.GA17222@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Woodard <woodard@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Travis Gummels <tgummels@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:48 +0000 (16:15 -0700)]
exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()
To ensure that load_misc_binary() can't use the partially destroyed
Node, see also the next patch.
The current logic looks wrong in any case, once we close interp_file it
doesn't make any sense to delay kfree(inode->i_private), this Node is no
longer valid. Even if the MISC_FMT_OPEN_FILE/interp_file checks were
not racy (they are), load_misc_binary() should not try to reopen
->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL.
And I can't understand why do we use filp_close(), not fput().
Link: http://lkml.kernel.org/r/20170922143644.GA17216@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Woodard <woodard@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Travis Gummels <tgummels@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:45 +0000 (16:15 -0700)]
exec: binfmt_misc: don't nullify Node->dentry in kill_node()
kill_node() nullifies/checks Node->dentry to avoid double free. This
complicates the next changes and this is very confusing:
- we do not need to check dentry != NULL under entries_lock,
kill_node() is always called under inode_lock(d_inode(root)) and we
rely on this inode_lock() anyway, without this lock the
MISC_FMT_OPEN_FILE cleanup could race with itself.
- if kill_inode() was already called and ->dentry == NULL we should not
even try to close e->interp_file.
We can change bm_entry_write() to simply check !list_empty(list) before
kill_node. Again, we rely on inode_lock(), in particular it saves us
from the race with bm_status_write(), another caller of kill_node().
Link: http://lkml.kernel.org/r/20170922143641.GA17210@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Woodard <woodard@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Travis Gummels <tgummels@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Tue, 3 Oct 2017 23:15:42 +0000 (16:15 -0700)]
exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".
It looks like this code was always wrong, then commit
948b701a607f
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.
This patch (of 6):
load_script() can simply use i_name instead, it points into bprm->buf[]
and nobody can change this memory until we call prepare_binprm().
The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.
While at it, do whitespace/style cleanups.
NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.
Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Travis Gummels <tgummels@redhat.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Tue, 3 Oct 2017 23:15:38 +0000 (16:15 -0700)]
userfaultfd: non-cooperative: fix fork use after free
When reading the event from the uffd, we put it on a temporary
fork_event list to detect if we can still access it after releasing and
retaking the event_wqh.lock.
If fork aborts and removes the event from the fork_event all is fine as
long as we're still in the userfault read context and fork_event head is
still alive.
We've to put the event allocated in the fork kernel stack, back from
fork_event list-head to the event_wqh head, before returning from
userfaultfd_ctx_read, because the fork_event head lifetime is limited to
the userfaultfd_ctx_read stack lifetime.
Forgetting to move the event back to its event_wqh place then results in
__remove_wait_queue(&ctx->event_wqh, &ewq->wq); in
userfaultfd_event_wait_completion to remove it from a head that has been
already freed from the reader stack.
This could only happen if resolve_userfault_fork failed (for example if
there are no file descriptors available to allocate the fork uffd). If
it succeeded it was put back correctly.
Furthermore, after find_userfault_evt receives a fork event, the forked
userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can
be released by the fork thread as soon as the event_wqh.lock is
released. Taking a reference on the fork_nctx before dropping the lock
prevents an use after free in resolve_userfault_fork().
If the fork side aborted and it already released everything, we still
try to succeed resolve_userfault_fork(), if possible.
Fixes: 893e26e61d04eac9 ("userfaultfd: non-cooperative: Add fork() event")
Link: http://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reza Arbab [Tue, 3 Oct 2017 23:15:35 +0000 (16:15 -0700)]
mm/device-public-memory: fix edge case in _vm_normal_page()
With device public pages at the end of my memory space, I'm getting
output from _vm_normal_page():
BUG: Bad page map in process migrate_pages pte:
c0800001ffff0d06 pmd:
f95d3000
addr:
00007fff89330000 vm_flags:
00100073 anon_vma:
c0000000fa899320 mapping: (null) index:
7fff8933
file: (null) fault: (null) mmap: (null) readpage: (null)
CPU: 0 PID: 13963 Comm: migrate_pages Tainted: P B OE 4.14.0-rc1-wip #155
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
print_bad_pte+0x28c/0x340
_vm_normal_page+0xc0/0x140
zap_pte_range+0x664/0xc10
unmap_page_range+0x318/0x670
unmap_vmas+0x74/0xe0
exit_mmap+0xe8/0x1f0
mmput+0xac/0x1f0
do_exit+0x348/0xcd0
do_group_exit+0x5c/0xf0
SyS_exit_group+0x1c/0x20
system_call+0x58/0x6c
The pfn causing this is the very last one. Correct the bounds check
accordingly.
Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache coherent with CPU")
Link: http://lkml.kernel.org/r/1506092178-20351-1-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 3 Oct 2017 23:15:32 +0000 (16:15 -0700)]
mm: fix data corruption caused by lazyfree page
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. If page reclaim finds
such page, it will simply add the page to swap cache without pageout the
page to swap because the page is marked as clean. Next time, page fault
will read data from the swap slot which doesn't have the original data,
so we have a data corruption. To fix issue, we mark the page dirty and
pageout the page.
However, we shouldn't dirty all pages which is clean and in swap cache.
swapin page is swap cache and clean too. So we only dirty page which is
added into swap cache in page reclaim, which shouldn't be swapin page.
As Minchan suggested, simply dirty the page in add_to_swap can do the
job.
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Link: http://lkml.kernel.org/r/08c84256b007bf3f63c91d94383bd9eb6fee2daa.1506446061.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Reported-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Tue, 3 Oct 2017 23:15:29 +0000 (16:15 -0700)]
mm: avoid marking swap cached page as lazyfree
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. Page reclaim could add
the page to swap cache and unmap the page. After page reclaim, the page
is added back to lru. At that time, we probably start draining per-cpu
pagevec and mark the page lazyfree. So the page could be in a state
with SwapBacked cleared and PG_swapcache set. Next time there is a
refault in the virtual address, do_swap_page can find the page from swap
cache but the page has PageSwapCache false because SwapBacked isn't set,
so do_swap_page will bail out and do nothing. The task will keep
running into fault handler.
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Reported-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 3 Oct 2017 23:15:25 +0000 (16:15 -0700)]
mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC
Eryu noticed that he could sometimes get a leftover error reported when
it shouldn't be on fsync with ext2 and non-journalled ext4.
The problem is that writeback_single_inode still uses filemap_fdatawait.
That picks up a previously set AS_EIO flag, which would ordinarily have
been cleared before.
Since we're mostly using this function as a replacement for
filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO
and AS_ENOSPC when reporting an error. That should allow the new
function to better emulate the behavior of the old with respect to these
flags.
Link: http://lkml.kernel.org/r/20170922133331.28812-1-jlayton@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 3 Oct 2017 23:15:23 +0000 (16:15 -0700)]
m32r: define CPU_BIG_ENDIAN
The build of m32r allmodconfig is giving lots of build warnings about:
include/linux/byteorder/big_endian.h:7:2:
warning: #warning inconsistent configuration,
needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
#warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN
Define CPU_BIG_ENDIAN like the way CPU_LITTLE_ENDIAN is defined.
Link: http://lkml.kernel.org/r/1505678083-10320-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Tue, 3 Oct 2017 23:15:19 +0000 (16:15 -0700)]
zram: fix null dereference of handle
In testing I found handle passed to zs_map_object in __zram_bvec_read is
NULL so eh kernel goes oops in pin_object().
The reason is there is no routine to check the slot's freeing after
getting the slot's lock. This patch fixes it.
[minchan@kernel.org: v2]
Link: http://lkml.kernel.org/r/1505887347-10881-1-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/1505788488-26723-1-git-send-email-minchan@kernel.org
Fixes: 1f7319c74275 ("zram: partial IO refactoring")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christophe Leroy [Tue, 3 Oct 2017 23:15:16 +0000 (16:15 -0700)]
mm: fix RODATA_TEST failure "rodata_test: test data was not read only"
On powerpc, RODATA_TEST fails with message the following messages:
Freeing unused kernel memory: 528K
rodata_test: test data was not read only
This is because GCC allocates it to .data section:
c0695034 g O .data
00000004 rodata_test_data
Since commit
056b9d8a7692 ("mm: remove rodata_test_data export, add
pr_fmt"), rodata_test_data is used only inside rodata_test.c By
declaring it static, it gets properly allocated into .rodata section
instead of .data:
c04df710 l O .rodata
00000004 rodata_test_data
Fixes: 056b9d8a7692 ("mm: remove rodata_test_data export, add pr_fmt")
Link: http://lkml.kernel.org/r/20170921093729.1080368AC1@po15668-vm-win7.idsi0.si.c-s.fr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ioan Nicu [Tue, 3 Oct 2017 23:15:13 +0000 (16:15 -0700)]
rapidio: remove global irq spinlocks from the subsystem
Locking of config and doorbell operations should be done only if the
underlying hardware requires it.
This patch removes the global spinlocks from the rapidio subsystem and
moves them to the mport drivers (fsl_rio and tsi721), only to the
necessary places. For example, local config space read and write
operations (lcread/lcwrite) are atomic in all existing drivers, so there
should be no need for locking, while the cread/cwrite operations which
generate maintenance transactions need to be synchronized with a lock.
Later, each driver could chose to use a per-port lock instead of a
global one, or even more granular locking.
Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com
Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com>
Signed-off-by: Frank Kunz <frank.kunz@nokia.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>