Aya Levin [Sun, 1 Dec 2019 14:33:55 +0000 (16:33 +0200)]
net/mlx5e: ethtool, Fix analysis of speed setting
When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is
configured while the user, who has an advanced HW which supports
extended PTYS, expects also 50G*2 to be configured.
With this patch, when extended PTYS mode is available, configure
PTYS via extended fields.
Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Sun, 1 Dec 2019 12:45:25 +0000 (14:45 +0200)]
net/mlx5e: Fix translation of link mode into speed
Add a missing value in translation of PTYS ext_eth_proto_oper to its
corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool
shows unknown speed. With this fix, ethtool shows speed is 100G as
expected.
Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Mon, 2 Dec 2019 17:19:47 +0000 (19:19 +0200)]
net/mlx5e: Fix free peer_flow when refcount is 0
It could be neigh update flow took a refcount on peer flow so
sometimes we cannot release peer flow even if parent flow is
being freed now.
Fixes: 5a7e5bcb663d ("net/mlx5e: Extend tc flow struct with reference counter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Wed, 4 Dec 2019 09:25:43 +0000 (11:25 +0200)]
net/mlx5e: Fix freeing flow with kfree() and not kvfree()
Flows are allocated with kzalloc() so free with kfree().
Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Thu, 5 Dec 2019 08:30:22 +0000 (10:30 +0200)]
net/mlx5e: Fix SFF 8472 eeprom length
SFF 8472 eeprom length is 512 bytes. Fix module info return value to
support 512 bytes read.
Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Huy Nguyen [Fri, 6 Sep 2019 14:28:46 +0000 (09:28 -0500)]
net/mlx5e: Query global pause state before setting prio2buffer
When the user changes prio2buffer mapping while global pause is
enabled, mlx5 driver incorrectly sets all active buffers
(buffer that has at least one priority mapped) to lossy.
Solution:
If global pause is enabled, set all the active buffers to lossless
in prio2buffer command.
Also, add error message when buffer size is not enough to meet
xoff threshold.
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Mon, 25 Nov 2019 10:11:49 +0000 (12:11 +0200)]
net/mlx5e: Fix TXQ indices to be sequential
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
static one, in order to keep indices consistent when changing number of
channels or TCs.
For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
When reducing the amount of channels to 8, the real num of TXQs will be
changed to 64.
This indices method is buggy:
- Channel #0, TC 3, the TXQ index is 96.
- Index 8 is not valid, as there is no such TXQ from driver perspective
(As it represents channel #8, TC 0, which is not valid with the above
configuration).
As part of driver's select queue, it calls netdev_pick_tx which returns an
index in the range of real number of TXQs. Depends on the return value,
with the examples above, driver could have returned index larger than the
real number of tx queues, or crash the kernel as it tries to read invalid
address of SQ which was not allocated.
Fix that by allocating sequential TXQ indices, and hold a new mapping
between (channel index, tc) => (real TXQ index). This mapping will be
updated as part of priv channels activation, and is used in
mlx5e_select_queue to find the selected queue index.
The existing indices mapping (channel_tc2txq) is no longer needed, as it
is used only for statistics structures and can be calculated on run time.
Delete its definintion and updates.
Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mian Yousaf Kaukab [Thu, 5 Dec 2019 09:41:16 +0000 (10:41 +0100)]
net: thunderx: start phy before starting autonegotiation
Since commit
2b3e88ea6528 ("net: phy: improve phy state checking")
phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
before calling phy_start_aneg() during probe so that autonegotiation
is initiated.
As phy_start() takes care of calling phy_start_aneg(), drop the explicit
call to phy_start_aneg().
Network fails without this patch on Octeon TX.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 5 Dec 2019 07:23:39 +0000 (07:23 +0000)]
hsr: fix a NULL pointer dereference in hsr_dev_xmit()
hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
return NULL if master node is not existing in the list.
But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
could occur.
Test commands:
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst
ip link set veth0 up
ip link set veth2 up
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set veth3 up
ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
ip netns exec nst ip link set hsr1 up
hping3 192.168.100.2 -2 --flood &
modprobe -rv hsr
Splat looks like:
[ 217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
[ 217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
[ 217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
[ 217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
[ 217.376919][ T1635] RSP: 0018:
ffff8880cd8af058 EFLAGS:
00010202
[ 217.377571][ T1635] RAX:
0000000000000000 RBX:
ffff8880acde6840 RCX:
0000000000000002
[ 217.379465][ T1635] RDX:
dffffc0000000000 RSI:
0000000000000004 RDI:
0000000000000010
[ 217.380274][ T1635] RBP:
ffff8880acde6840 R08:
ffffed101b440d5d R09:
0000000000000001
[ 217.381078][ T1635] R10:
0000000000000001 R11:
ffffed101b440d5c R12:
ffff8880bffcc000
[ 217.382023][ T1635] R13:
ffff8880bffcc088 R14:
0000000000000000 R15:
ffff8880ca675c00
[ 217.383094][ T1635] FS:
00007f060d9d1740(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 217.384289][ T1635] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 217.385009][ T1635] CR2:
00007faf15381dd0 CR3:
00000000d523c001 CR4:
00000000000606e0
[ 217.385940][ T1635] Call Trace:
[ 217.386544][ T1635] dev_hard_start_xmit+0x160/0x740
[ 217.387114][ T1635] __dev_queue_xmit+0x1961/0x2e10
[ 217.388118][ T1635] ? check_object+0xaf/0x260
[ 217.391466][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.392017][ T1635] ? init_object+0x6b/0x80
[ 217.392629][ T1635] ? netdev_core_pick_tx+0x2e0/0x2e0
[ 217.393175][ T1635] ? __alloc_skb+0xb9/0x500
[ 217.393727][ T1635] ? rcu_read_lock_sched_held+0x90/0xc0
[ 217.394331][ T1635] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 217.395013][ T1635] ? kasan_unpoison_shadow+0x30/0x40
[ 217.395668][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.396280][ T1635] ? __kmalloc_node_track_caller+0x3a8/0x3f0
[ 217.399007][ T1635] ? __kasan_kmalloc.constprop.4+0xa0/0xd0
[ 217.400093][ T1635] ? __kmalloc_reserve.isra.46+0x2e/0xb0
[ 217.401118][ T1635] ? memset+0x1f/0x40
[ 217.402529][ T1635] ? __alloc_skb+0x317/0x500
[ 217.404915][ T1635] ? arp_xmit+0xca/0x2c0
[ ... ]
Fixes: 311633b60406 ("hsr: switch ->dellink() to ->ndo_uninit()")
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Varghese [Thu, 5 Dec 2019 00:27:22 +0000 (05:57 +0530)]
net: Fixed updating of ethertype in skb_mpls_push()
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.
In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.
recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Ardelean [Wed, 4 Dec 2019 07:58:09 +0000 (09:58 +0200)]
NFC: NCI: use new `delay` structure for SPI transfer delays
In a recent change to the SPI subsystem [1], a new `delay` struct was added
to replace the `delay_usecs`. This change replaces the current `delay_secs`
with `delay` for this driver.
The `spi_transfer_delay_exec()` function [in the SPI framework] makes sure
that both `delay_usecs` & `delay` are used (in this order to preserve
backwards compatibility).
[1] commit
bebcfd272df6485 ("spi: introduce `delay` field for
`spi_transfer` + spi_transfer_delay_exec()")
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 3 Dec 2019 23:51:28 +0000 (23:51 +0000)]
net: sfp: fix hwmon
The referenced commit below allowed more than one hwmon device to be
created per SFP, which is definitely not what we want. Avoid this by
only creating the hwmon device just as we transition to WAITDEV state.
Fixes: 139d3a212a1f ("net: sfp: allow modules with slow diagnostics to probe")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 3 Dec 2019 23:51:22 +0000 (23:51 +0000)]
net: sfp: fix unbind
When unbinding, we don't correctly tear down the module state, leaving
(for example) the hwmon registration behind. Ensure everything is
properly removed by sending a remove event at unbind.
Fixes: 6b0da5c9c1a3 ("net: sfp: track upstream's attachment state in state machine")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 3 Dec 2019 22:17:34 +0000 (14:17 -0800)]
ionic: keep users rss hash across lif reset
If the user has specified their own RSS hash key, don't
lose it across queue resets such as DOWN/UP, MTU change,
and number of channels change. This is fixed by moving
the key initialization to a little earlier in the lif
creation.
Also, let's clean up the RSS config a little better on
the way down by setting it all to 0.
Fixes: aa3198819bea ("ionic: Add RSS support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Lemon [Tue, 3 Dec 2019 22:01:14 +0000 (14:01 -0800)]
xdp: obtain the mem_id mutex before trying to remove an entry.
A lockdep splat was observed when trying to remove an xdp memory
model from the table since the mutex was obtained when trying to
remove the entry, but not before the table walk started:
Fix the splat by obtaining the lock before starting the table walk.
Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Tue, 3 Dec 2019 21:34:14 +0000 (16:34 -0500)]
act_ct: support asymmetric conntrack
The act_ct TC module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The act_ct action doesn't have such capability.
Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Conole [Tue, 3 Dec 2019 21:34:13 +0000 (16:34 -0500)]
openvswitch: support asymmetric conntrack
The openvswitch module shares a common conntrack and NAT infrastructure
exposed via netfilter. It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision. Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress. The openvswitch module doesn't have such capability.
Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.
Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Dec 2019 20:27:13 +0000 (12:27 -0800)]
Merge branch 'net-convert-ipv6_stub-to-ip6_dst_lookup_flow'
Sabrina Dubroca says:
====================
net: convert ipv6_stub to ip6_dst_lookup_flow
Xiumei Mu reported a bug in a VXLAN over IPsec setup:
IPv6 | ESP | VXLAN
Using this setup, packets go out unencrypted, because VXLAN over IPv6
gets its route from ipv6_stub->ipv6_dst_lookup (in vxlan6_get_route),
which doesn't perform an XFRM lookup.
This patchset first makes ip6_dst_lookup_flow suitable for some
existing users of ipv6_stub->ipv6_dst_lookup by adding a 'net'
argument, then converts all those users.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Wed, 4 Dec 2019 14:35:53 +0000 (15:35 +0100)]
net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.
All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().
This requires some changes in all the callers, as these two functions
take different arguments and have different return types.
Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Wed, 4 Dec 2019 14:35:52 +0000 (15:35 +0100)]
net: ipv6: add net argument to ip6_dst_lookup_flow
This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
as some modules currently pass a net argument without a socket to
ip6_dst_lookup. This is equivalent to commit
343d60aada5a ("ipv6: change
ipv6_stub_impl.ipv6_dst_lookup to take net argument").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshiki Komachi [Tue, 3 Dec 2019 10:40:12 +0000 (19:40 +0900)]
cls_flower: Fix the behavior using port ranges with hw-offload
The recent commit
5c72299fba9d ("net: sched: cls_flower: Classify
packets using port ranges") had added filtering based on port ranges
to tc flower. However the commit missed necessary changes in hw-offload
code, so the feature gave rise to generating incorrect offloaded flow
keys in NIC.
One more detailed example is below:
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 100-200 action drop
With the setup above, an exact match filter with dst_port == 0 will be
installed in NIC by hw-offload. IOW, the NIC will have a rule which is
equivalent to the following one.
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
dst_port 0 action drop
The behavior was caused by the flow dissector which extracts packet
data into the flow key in the tc flower. More specifically, regardless
of exact match or specified port ranges, fl_init_dissector() set the
FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
numbers from skb in skb_flow_dissect() called by fl_classify(). Note
that device drivers received the same struct flow_dissector object as
used in skb_flow_dissect(). Thus, offloaded drivers could not identify
which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
set to struct flow_dissector in either case.
This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
tp_range field in struct fl_flow_key to recognize which filters are applied
to offloaded drivers. At this point, when filters based on port ranges
passed to drivers, drivers return the EOPNOTSUPP error because they do
not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
flag).
Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dust Li [Tue, 3 Dec 2019 03:17:40 +0000 (11:17 +0800)]
net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues
sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
in mq_dump() and mqprio_dump().
Fixes: ce679e8df7ed ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 3 Dec 2019 16:05:52 +0000 (08:05 -0800)]
tcp: refactor tcp_retransmit_timer()
It appears linux-4.14 stable needs a backport of commit
88f8598d0a30 ("tcp: exit if nothing to retransmit on RTO timeout")
Since tcp_rtx_queue_empty() is not in pre 4.15 kernels,
let's refactor tcp_retransmit_timer() to only use tcp_rtx_queue_head()
I will provide to stable teams the squashed patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 3 Dec 2019 15:45:35 +0000 (17:45 +0200)]
net: mscc: ocelot: unregister the PTP clock on deinit
Currently a switch driver deinit frees the regmaps, but the PTP clock is
still out there, available to user space via /dev/ptpN. Any PTP
operation is a ticking time bomb, since it will attempt to use the freed
regmaps and thus trigger kernel panics:
[ 4.291746] fsl_enetc 0000:00:00.2 eth1: error -22 setting up slave phy
[ 4.291871] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22
[ 4.308666] mscc_felix: probe of 0000:00:00.5 failed with error -22
[ 6.358270] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000088
[ 6.367090] Mem abort info:
[ 6.369888] ESR = 0x96000046
[ 6.369891] EC = 0x25: DABT (current EL), IL = 32 bits
[ 6.369892] SET = 0, FnV = 0
[ 6.369894] EA = 0, S1PTW = 0
[ 6.369895] Data abort info:
[ 6.369897] ISV = 0, ISS = 0x00000046
[ 6.369899] CM = 0, WnR = 1
[ 6.369902] user pgtable: 4k pages, 48-bit VAs, pgdp=
00000020d58c7000
[ 6.369904] [
0000000000000088] pgd=
00000020d5912003, pud=
00000020d5915003, pmd=
0000000000000000
[ 6.369914] Internal error: Oops:
96000046 [#1] PREEMPT SMP
[ 6.420443] Modules linked in:
[ 6.423506] CPU: 1 PID: 262 Comm: phc_ctl Not tainted
5.4.0-03625-gb7b2a5dadd7f #204
[ 6.431273] Hardware name: LS1028A RDB Board (DT)
[ 6.435989] pstate:
40000085 (nZcv daIf -PAN -UAO)
[ 6.440802] pc : css_release+0x24/0x58
[ 6.444561] lr : regmap_read+0x40/0x78
[ 6.448316] sp :
ffff800010513cc0
[ 6.451636] x29:
ffff800010513cc0 x28:
ffff002055873040
[ 6.456963] x27:
0000000000000000 x26:
0000000000000000
[ 6.462289] x25:
0000000000000000 x24:
0000000000000000
[ 6.467617] x23:
0000000000000000 x22:
0000000000000080
[ 6.472944] x21:
ffff800010513d44 x20:
0000000000000080
[ 6.478270] x19:
0000000000000000 x18:
0000000000000000
[ 6.483596] x17:
0000000000000000 x16:
0000000000000000
[ 6.488921] x15:
0000000000000000 x14:
0000000000000000
[ 6.494247] x13:
0000000000000000 x12:
0000000000000000
[ 6.499573] x11:
0000000000000000 x10:
0000000000000000
[ 6.504899] x9 :
0000000000000000 x8 :
0000000000000000
[ 6.510225] x7 :
0000000000000000 x6 :
ffff800010513cf0
[ 6.515550] x5 :
0000000000000000 x4 :
0000000fffffffe0
[ 6.520876] x3 :
0000000000000088 x2 :
ffff800010513d44
[ 6.526202] x1 :
ffffcada668ea000 x0 :
ffffcada64d8b0c0
[ 6.531528] Call trace:
[ 6.533977] css_release+0x24/0x58
[ 6.537385] regmap_read+0x40/0x78
[ 6.540795] __ocelot_read_ix+0x6c/0xa0
[ 6.544641] ocelot_ptp_gettime64+0x4c/0x110
[ 6.548921] ptp_clock_gettime+0x4c/0x58
[ 6.552853] pc_clock_gettime+0x5c/0xa8
[ 6.556699] __arm64_sys_clock_gettime+0x68/0xc8
[ 6.561331] el0_svc_common.constprop.2+0x7c/0x178
[ 6.566133] el0_svc_handler+0x34/0xa0
[ 6.569891] el0_sync_handler+0x114/0x1d0
[ 6.573908] el0_sync+0x140/0x180
[ 6.577232] Code:
d503201f b00119a1 91022263 b27b7be4 (
f9004663)
[ 6.583349] ---[ end trace
d196b9b14cdae2da ]---
[ 6.587977] Kernel panic - not syncing: Fatal exception
[ 6.593216] SMP: stopping secondary CPUs
[ 6.597151] Kernel Offset: 0x4ada54400000 from 0xffff800010000000
[ 6.603261] PHYS_OFFSET: 0xffffd0a7c0000000
[ 6.607454] CPU features: 0x10002,
21806008
[ 6.611558] Memory Limit: none
And now that ocelot->ptp_clock is checked at exit, prevent a potential
error where ptp_clock_register returned a pointer-encoded error, which
we are keeping in the ocelot private data structure. So now,
ocelot->ptp_clock is now either NULL or a valid pointer.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Cc: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danit Goldberg [Tue, 3 Dec 2019 15:43:36 +0000 (17:43 +0200)]
net/core: Populate VF index in struct ifla_vf_guid
In addition to filling the node_guid and port_guid attributes,
there is a need to populate VF index too, otherwise users of netlink
interface will see same VF index for all VFs.
Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs")
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 3 Dec 2019 14:48:06 +0000 (16:48 +0200)]
net: bridge: deny dev_set_mac_address() when unregistering
We have an interesting memory leak in the bridge when it is being
unregistered and is a slave to a master device which would change the
mac of its slaves on unregister (e.g. bond, team). This is a very
unusual setup but we do end up leaking 1 fdb entry because
dev_set_mac_address() would cause the bridge to insert the new mac address
into its table after all fdbs are flushed, i.e. after dellink() on the
bridge has finished and we call NETDEV_UNREGISTER the bond/team would
release it and will call dev_set_mac_address() to restore its original
address and that in turn will add an fdb in the bridge.
One fix is to check for the bridge dev's reg_state in its
ndo_set_mac_address callback and return an error if the bridge is not in
NETREG_REGISTERED.
Easy steps to reproduce:
1. add bond in mode != A/B
2. add any slave to the bond
3. add bridge dev as a slave to the bond
4. destroy the bridge device
Trace:
unreferenced object 0xffff888035c4d080 (size 128):
comm "ip", pid 4068, jiffies
4296209429 (age 1413.753s)
hex dump (first 32 bytes):
41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............
d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?...........
backtrace:
[<
00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
[<
00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
[<
0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
[<
00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
[<
000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
[<
00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
[<
000000006846a77f>] dev_set_mac_address+0x63/0x9b
[<
00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
[<
00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
[<
00000000305d7795>] notifier_call_chain+0x38/0x56
[<
0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
[<
000000008279477b>] rollback_registered_many+0x353/0x6a4
[<
0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
[<
00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
[<
00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
[<
000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
Fixes: 43598813386f ("bridge: add local MAC address to forwarding table (v2)")
Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 3 Dec 2019 14:12:39 +0000 (17:12 +0300)]
net: fix a leak in register_netdevice()
We have to free "dev->name_node" on this error path.
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reported-by: syzbot+6e13e65ffbaa33757bcb@syzkaller.appspotmail.com
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Dec 2019 19:14:41 +0000 (11:14 -0800)]
Merge tag 'linux-can-fixes-for-5.5-
20191203' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-12-03
this is a pull request of 6 patches for net/master.
The first two patches are against the MAINTAINERS file and adds Appana
Durga Kedareswara rao as maintainer for the xilinx-can driver and Sriram
Dash for the m_can (mmio) driver.
The next patch is by Jouni Hogander and fixes a use-after-free in the
slcan driver.
Johan Hovold's patch for the ucan driver fixes the non-atomic allocation
in the completion handler.
The last two patches target the xilinx-can driver. The first one is by
Venkatesh Yadav Abbarapu and skips the error message on deferred probe,
the second one is by Srinivas Neeli and fixes the usage of the skb after
can_put_echo_skb().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Srinivas Neeli [Mon, 2 Dec 2019 13:02:11 +0000 (18:32 +0530)]
can: xilinx_can: Fix usage of skb memory
As per linux can framework, driver not allowed to touch the skb memory
after can_put_echo_skb() call.
This patch fixes the same.
https://www.spinics.net/lists/linux-can/msg02199.html
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Reviewed-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Venkatesh Yadav Abbarapu [Mon, 2 Dec 2019 13:02:10 +0000 (18:32 +0530)]
can: xilinx_can: skip error message on deferred probe
When the CAN bus clock is provided from the clock wizard, clock wizard
driver may not be available when can driver probes resulting to the
error message "bus clock not found error".
As this error message is not very useful to the end user, skip printing
in the case of deferred probe.
Signed-off-by: Venkatesh Yadav Abbarapu <venkatesh.abbarapu@xilinx.com>
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Johan Hovold [Thu, 28 Nov 2019 18:26:03 +0000 (19:26 +0100)]
can: ucan: fix non-atomic allocation in completion handler
USB completion handlers are called in atomic context and must
specifically not allocate memory using GFP_KERNEL.
Fixes: 9f2d3eae88d2 ("can: ucan: add driver for Theobroma Systems UCAN devices")
Cc: stable <stable@vger.kernel.org> # 4.19
Cc: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Cc: Martin Elshuber <martin.elshuber@theobroma-systems.com>
Cc: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jouni Hogander [Wed, 27 Nov 2019 06:40:26 +0000 (08:40 +0200)]
can: slcan: Fix use-after-free Read in slcan_open
Slcan_open doesn't clean-up device which registration failed from the
slcan_devs device list. On next open this list is iterated and freed
device is accessed. Fix this by calling slc_free_netdev in error path.
Driver/net/can/slcan.c is derived from slip.c. Use-after-free error was
identified in slip_open by syzboz. Same bug is in slcan.c. Here is the
trace from the Syzbot slip report:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:634
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
sl_sync drivers/net/slip/slip.c:725 [inline]
slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801
tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469
tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596
tiocsetd drivers/tty/tty_io.c:2334 [inline]
tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ed50e1600b44 ("slcan: Fix memory leak in error path")
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: David Miller <davem@davemloft.net>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v5.4
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sriram Dash [Tue, 3 Dec 2019 04:29:09 +0000 (09:59 +0530)]
MAINTAINERS: add myself as maintainer of MCAN MMIO device driver
Since we are actively working on MMIO MCAN device driver,
as discussed with Marc, I am adding myself as a maintainer.
Signed-off-by: Sriram Dash <sriram.dash@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Appana Durga Kedareswara rao [Thu, 21 Nov 2019 08:39:24 +0000 (14:09 +0530)]
MAINTAINERS: add fragment for xilinx CAN driver
Added entry for xilinx CAN driver.
Signed-off-by: Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Martin Varghese [Mon, 2 Dec 2019 05:19:51 +0000 (10:49 +0530)]
Fixed updating of ethertype in function skb_mpls_pop
The skb_mpls_pop was not updating ethertype of an ethernet packet if the
packet was originally received from a non ARPHRD_ETHER device.
In the below OVS data path flow, since the device corresponding to port 7
is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
the ethertype of the packet even though the previous push_eth action had
added an ethernet header to the packet.
recirc_id(0),in_port(7),eth_type(0x8847),
mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
pop_mpls(eth_type=0x800),4
Fixes: ed246cee09b9 ("net: core: move pop MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Victorien Molle [Mon, 2 Dec 2019 14:11:38 +0000 (15:11 +0100)]
sch_cake: Add missing NLA policy entry TCA_CAKE_SPLIT_GSO
This field has never been checked since introduction in mainline kernel
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Fixes: 2db6dc2662ba "sch_cake: Make gso-splitting configurable from userspace"
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Dec 2019 18:50:29 +0000 (10:50 -0800)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2019-12-02
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 6 day(s) which contain
a total of 10 files changed, 60 insertions(+), 51 deletions(-).
The main changes are:
1) Fix vmlinux BTF generation for binutils pre v2.25, from Stanislav Fomichev.
2) Fix libbpf global variable relocation to take symbol's st_value offset
into account, from Andrii Nakryiko.
3) Fix libbpf build on powerpc where check_abi target fails due to different
readelf output format, from Aurelien Jarno.
4) Don't set BPF insns RO for the case when they are JITed in order to avoid
fragmenting the direct map, from Daniel Borkmann.
5) Fix static checker warning in btf_distill_func_proto() as well as a build
error due to empty enum when BPF is compiled out, from Alexei Starovoitov.
6) Fix up generation of bpf_helper_defs.h for perf, from Arnaldo Carvalho de Melo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Aurelien Jarno [Sun, 1 Dec 2019 19:57:28 +0000 (20:57 +0100)]
libbpf: Fix readelf output parsing on powerpc with recent binutils
On powerpc with recent versions of binutils, readelf outputs an extra
field when dumping the symbols of an object file. For example:
35:
0000000000000838 96 FUNC LOCAL DEFAULT [<localentry>: 8] 1 btf_is_struct
The extra "[<localentry>: 8]" prevents the GLOBAL_SYM_COUNT variable to
be computed correctly and causes the check_abi target to fail.
Fix that by looking for the symbol name in the last field instead of the
8th one. This way it should also cope with future extra fields.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/bpf/20191201195728.4161537-1-aurelien@aurel32.net
Linus Torvalds [Mon, 2 Dec 2019 04:36:41 +0000 (20:36 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
"Incoming:
- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM
I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
...
Linus Torvalds [Mon, 2 Dec 2019 04:35:03 +0000 (20:35 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix several scatter gather list issues in kTLS code, from Jakub
Kicinski.
2) macb driver device remove has to kill the hresp_err_tasklet. From
Chuhong Yuan.
3) Several memory leak and reference count bug fixes in tipc, from Tung
Nguyen.
4) Fix mlx5 build error w/o ipv6, from Yue Haibing.
5) Fix jumbo frame and other regressions in r8169, from Heiner
Kallweit.
6) Undo some BUG_ON()'s and replace them with WARN_ON_ONCE and proper
error propagation/handling. From Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (24 commits)
openvswitch: remove another BUG_ON()
openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
net: phy: realtek: fix using paged operations with RTL8105e / RTL8208
r8169: fix resume on cable plug-in
r8169: fix jumbo configuration for RTL8168evl
net: emulex: benet: indent a Kconfig depends continuation line
selftests: forwarding: fix race between packet receive and tc check
net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
net: ethernet: ti: ale: ensure vlan/mdb deleted when no members
net/mlx5e: Fix build error without IPV6
selftests: pmtu: use -oneline for ip route list cache
tipc: fix duplicate SYN messages under link congestion
tipc: fix wrong timeout input for tipc_wait_for_cond()
tipc: fix wrong socket reference counter after tipc_sk_timeout() returns
tipc: fix potential memory leak in __tipc_sendmsg()
net: macb: add missed tasklet_kill
selftests: bpf: correct perror strings
selftests: bpf: test_sockmap: handle file creation failures gracefully
net/tls: use sg_next() to walk sg entries
net/tls: remove the dead inplace_crypto code
...
Linus Torvalds [Mon, 2 Dec 2019 03:05:07 +0000 (19:05 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Various fixes:
- Fix the PAT performance regression that downgraded write-combining
device memory regions to uncached.
- There's been a number of bugs in 32-bit double fault handling -
hopefully all fixed now.
- Fix an LDT crash
- Fix an FPU over-optimization that broke with GCC9 code
optimizations.
- Misc cleanups"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Fix off-by-one bugs in interval tree search
x86/ioperm: Save an indentation level in tss_update_io_bitmap()
x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
x86/entry/32: Remove unused 'restore_all_notrace' local label
x86/ptrace: Document FSBASE and GSBASE ABI oddities
x86/ptrace: Remove set_segment_reg() implementations for current
x86/traps: die() instead of panicking on a double fault
x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
x86/doublefault/32: Rename doublefault.c to doublefault_32.c
x86/traps: Disentangle the 32-bit and 64-bit doublefault code
lkdtm: Add a DOUBLE_FAULT crash type on x86
selftests/x86/single_step_syscall: Check SYSENTER directly
x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
Linus Torvalds [Mon, 2 Dec 2019 02:49:57 +0000 (18:49 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
- Make /sys/devices/cpu/rdpmc based RDPMC enforcement more
instantaneous
- decoder: Update the Intel opcode map
- Various tooling fixes, including a few late optimizations and
cleanups.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
perf script: Fix invalid LBR/binary mismatch error
perf script: Fix brstackinsn for AUXTRACE
perf affinity: Add infrastructure to save/restore affinity
perf pmu: Use file system cache to optimize sysfs access
perf regs: Make perf_reg_name() return "unknown" instead of NULL
perf diff: Use llabs() with 64-bit values
perf diff: Use llabs() with 64-bit values
perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0
perf tools: Allow to link with libbpf dynamicaly
perf tests: Rename tests/map_groups.c to tests/maps.c
perf tests: Rename thread-mg-share to thread-maps-share
perf maps: Rename map_groups.h to maps.h
perf maps: Rename 'mg' variables to 'maps'
perf map_symbol: Rename ms->mg to ms->maps
perf addr_location: Rename al->mg to al->maps
perf thread: Rename thread->mg to thread->maps
perf maps: Merge 'struct maps' with 'struct map_groups'
x86/insn: perf tools: Add some more instructions to the new instructions test
x86/insn: Add some more Intel instructions to the opcode map
perf map: Remove unused functions
...
Linus Torvalds [Mon, 2 Dec 2019 02:45:29 +0000 (18:45 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- updates to Ilitech driver to support ILI2117
- face lift of st1232 driver to support MT-B protocol
- a new driver for i.MX system controller keys
- mpr121 driver now supports polling mode
- various input drivers have been switched away from input_polled_dev
to use polled mode of regular input devices
- other assorted cleanups and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (70 commits)
Input: synaptics-rmi4 - fix various V4L2 compliance problems in F54
Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
Input: fix Kconfig indentation
Input: imx_sc_key - correct SCU message structure to avoid stack corruption
Input: ili210x - optionally show calibrate sysfs attribute
Input: ili210x - add resolution to chip operations structure
Input: ili210x - do not retrieve/print chip firmware version
Input: mms114 - use device_get_match_data
Input: ili210x - remove unneeded suspend and resume handlers
Input: ili210x - do not unconditionally mark touchscreen as wakeup source
Input: ili210x - define and use chip operations structure
Input: ili210x - do not set parent device explicitly
Input: ili210x - handle errors from input_mt_init_slots()
Input: ili210x - switch to using threaded IRQ
Input: ili210x - add ILI2117 support
dt-bindings: input: touchscreen: ad7879: generic node names in example
Input: ar1021 - fix typo in preprocessor macro name
Input: synaptics-rmi4 - simplify data read in rmi_f54_work
Input: kxtj9 - switch to using polled mode of input devices
Input: kxtj9 - switch to using managed resources
...
Linus Torvalds [Mon, 2 Dec 2019 02:43:25 +0000 (18:43 -0800)]
Merge tag 'libnvdimm-for-5.5' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The highlight this cycle is continuing integration fixes for PowerPC
and some resulting optimizations.
Summary:
- Updates to better support vmalloc space restrictions on PowerPC
platforms.
- Cleanups to move common sysfs attributes to core 'struct
device_type' objects.
- Export the 'target_node' attribute (the effective numa node if pmem
is marked online) for regions and namespaces.
- Miscellaneous fixups and optimizations"
* tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
MAINTAINERS: Remove Keith from NVDIMM maintainers
libnvdimm: Export the target_node attribute for regions and namespaces
dax: Add numa_node to the default device-dax attributes
libnvdimm: Simplify root read-only definition for the 'resource' attribute
dax: Simplify root read-only definition for the 'resource' attribute
dax: Create a dax device_type
libnvdimm: Move nvdimm_bus_attribute_group to device_type
libnvdimm: Move nvdimm_attribute_group to device_type
libnvdimm: Move nd_mapping_attribute_group to device_type
libnvdimm: Move nd_region_attribute_group to device_type
libnvdimm: Move nd_numa_attribute_group to device_type
libnvdimm: Move nd_device_attribute_group to device_type
libnvdimm: Move region attribute group definition
libnvdimm: Move attribute groups to device type
libnvdimm: Remove prototypes for nonexistent functions
libnvdimm/btt: fix variable 'rc' set but not used
libnvdimm/pmem: Delete include of nd-core.h
libnvdimm/namespace: Differentiate between probe mapping and runtime mapping
libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe
libnvdimm: Trivial comment fix
...
Linus Torvalds [Mon, 2 Dec 2019 02:42:02 +0000 (18:42 -0800)]
Merge tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
- omap : misc - catch error returned from pm_runtime_put_sync
- hisi : misc - drop .owner from platform_driver
- stm : change how wakeup is handled
- imx : fix - bailout on error and nuke correct irq
- imx : add support for imx7ulp platform
* tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: imx: add support for imx v1 mu
dt-bindings: mailbox: imx-mu: add imx7ulp MU support
mailbox: imx: Clear the right interrupts at shutdown
mailbox: imx: Fix Tx doorbell shutdown path
mailbox: stm32-ipcc: Update wakeup management
mailbox: no need to set .owner platform_driver_register
mailbox/omap: Handle if CONFIG_PM is disabled
Linus Torvalds [Mon, 2 Dec 2019 02:40:28 +0000 (18:40 -0800)]
Merge tag 'hwlock-v5.5' of git://git./linux/kernel/git/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"This contains a number of cleanups to the core and several drivers, in
particular removing the requirement for drivers to implement
pm_runtime.
It also udpates the location of the git tree in MAINTAINERS"
* tag 'hwlock-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
hwspinlock: u8500_hsem: Remove redundant PM runtime implementation
hwspinlock: sprd: Remove redundant PM runtime implementation
hwspinlock: Let the PM runtime can be optional
hwspinlock: Remove BUG_ON() from the hwspinlock core
hwspinlock: sprd: Use devm_hwspin_lock_register() to register hwlock controller
hwspinlock: sprd: Use devm_add_action_or_reset() for calls to clk_disable_unprepare()
hwspinlock: sprd: Check the return value of clk_prepare_enable()
hwspinlock: sprd: Change to use devm_platform_ioremap_resource()
hwspinlock: u8500_hsem: Use devm_hwspin_lock_register() to register hwlock controller
hwspinlock: u8500_hsem: Use devm_kzalloc() to allocate memory
hwspinlock: u8500_hsem: Change to use devm_platform_ioremap_resource()
MAINTAINERS: hwspinlock: update git tree location
Linus Torvalds [Mon, 2 Dec 2019 02:39:24 +0000 (18:39 -0800)]
Merge tag 'rpmsg-v5.5' of git://git./linux/kernel/git/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This contains a number of bug fixes to the GLINK transport driver, an
off-by-one in the GLINK smem driver and a memory leak fix in the rpmsg
char driver"
* tag 'rpmsg-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
rpmsg: Fix Kconfig indentation
rpmsg: char: Simplify 'rpmsg_eptdev_release()'
rpmsg: glink: Free pending deferred work on remove
rpmsg: glink: Don't send pending rx_done during remove
rpmsg: glink: Fix rpmsg_register_device err handling
rpmsg: glink: Put an extra reference during cleanup
rpmsg: glink: Fix use after free in open_ack TIMEOUT case
rpmsg: glink: Fix reuse intents memory leak issue
rpmsg: glink: Set tail pointer to 0 at end of FIFO
rpmsg: char: release allocated memory
Linus Torvalds [Mon, 2 Dec 2019 02:35:47 +0000 (18:35 -0800)]
Merge tag 'rproc-v5.5' of git://git./linux/kernel/git/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"This adds support for booting the modem processor on Qualcomm MSM8998
and carries some cleanup up and bug fixes to the framework and the
stm32 driver"
* tag 'rproc-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
Revert "dt-bindings: remoteproc: stm32: add wakeup-source"
remoteproc: stm32: fix probe error case
remoteproc: stm32: wakeup the system by wdg irq
dt-bindings: remoteproc: stm32: add wakeup-source
remoteproc: Fix wrong rvring index computation
remoteproc: stm32: use workqueue to treat mailbox callback
remoteproc: fix argument 2 of rproc_mem_entry_init
remoteproc: qcom_q6v5_mss: Add support for MSM8998
dt-bindings: remoteproc: qcom: Add Q6v5 Modem PIL binding for MSM8998
remoteproc: debug: Remove unneeded NULL check
remoteproc: remove useless typedef
Linus Torvalds [Mon, 2 Dec 2019 02:29:36 +0000 (18:29 -0800)]
Merge branch 'i2c/for-5.5' of git://git./linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"I2C has mostly driver updates this time.
The few noteworthy changes are: the core has now support for analog
and digital filters with at91 being the first user, a core addition to
replace the NULL returning i2c_new_probed_device() with an ERR_PTR
variant, and the pxa driver has finally being moved to use the generic
I2C slave interface. We have quite a significant number of reviews per
patch this time, so thank you to all involved!"
* 'i2c/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (37 commits)
video: fbdev: matrox: convert to i2c_new_scanned_device
i2c: icy: convert to i2c_new_scanned_device
i2c: replace i2c_new_probed_device with an ERR_PTR variant
i2c: Fix Kconfig indentation
i2c: smbus: Don't filter out duplicate alerts
i2c: i801: Correct Intel Jasper Lake SOC naming
i2c: i2c-stm32f7: fix 10-bits check in slave free id search loop
i2c: iproc: Add i2c repeated start capability
i2c: remove helpers for ref-counting clients
i2c: tegra: Use dma_request_chan() directly for channel request
i2c: sh_mobile: Use dma_request_chan() directly for channel request
i2c: qup: Use dma_request_chan() directly for channel request
i2c: at91: Use dma_request_chan() directly for channel request
i2c: rcar: Remove superfluous call to clk_get_rate()
i2c: pxa: remove unused i2c-slave APIs
i2c: pxa: migrate to new i2c_slave APIs
i2c: cros-ec-tunnel: Make the device acpi compatible
i2c: stm32f7: report dma error during probe
i2c: icy: no need to populate address for scanned device
i2c: xiic: Fix kerneldoc warnings
...
Linus Torvalds [Mon, 2 Dec 2019 02:26:56 +0000 (18:26 -0800)]
Merge tag 'for-linus-
20191129' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"I wasn't going to send this one off so soon, but unfortunately one of
the fixes from the previous pull broke the build on some archs. So I'm
sending this sooner rather than later. This contains:
- Add highmem.h include for io_uring, because of the kmap() additions
from last round. For some reason the build bot didn't spot this
even though it sat for days.
- Three minor ';' removals
- Add support for the Beurer CD-on-a-chip device
- Make io_uring work on MMU-less archs"
* tag 'for-linus-
20191129' of git://git.kernel.dk/linux-block:
io_uring: fix missing kmap() declaration on powerpc
ataflop: Remove unneeded semicolon
block: sunvdc: Remove unneeded semicolon
drbd: Remove unneeded semicolon
io_uring: add mapping support for NOMMU archs
sr_vendor: support Beurer GL50 evo CD-on-a-chip devices.
cdrom: respect device capabilities during opening action
Linus Torvalds [Mon, 2 Dec 2019 02:24:25 +0000 (18:24 -0800)]
Merge tag 'platform-drivers-x86-v5.5-1' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver updates from Andy Shevchenko:
- New bootctl driver for Mellanox BlueField SoC.
- New driver to support System76 laptops.
- Temperature monitoring and fan control on Acer Aspire 7551 is now
supported.
- Previously the Huawei driver handled only hotkeys. After the
conversion to WMI it has been expanded to support newer laptop
models.
- Big refactoring of intel-speed-select tools allows to use it on Intel
CascadeLake-N systems.
- Touchscreen support for ezpad 6 m4 and Schneider SCT101CTM tablets
- Miscellaneous clean ups and fixes here and there.
* tag 'platform-drivers-x86-v5.5-1' of git://git.infradead.org/linux-platform-drivers-x86: (59 commits)
platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
platform/x86: intel_pmc_core: Add Comet Lake (CML) platform support to intel_pmc_core driver
platform/x86: intel_pmc_core: Fix the SoC naming inconsistency
platform/mellanox: Fix Kconfig indentation
tools/power/x86/intel-speed-select: Display TRL buckets for just base config level
tools/power/x86/intel-speed-select: Ignore missing config level
platform/x86: touchscreen_dmi: Add info for the ezpad 6 m4 tablet
tools/power/x86/intel-speed-select: Increment version
tools/power/x86/intel-speed-select: Use core count for base-freq mask
tools/power/x86/intel-speed-select: Support platform with limited Intel(R) Speed Select
tools/power/x86/intel-speed-select: Use Frequency weight for CLOS
tools/power/x86/intel-speed-select: Make CLOS frequency in MHz
tools/power/x86/intel-speed-select: Use mailbox for CLOS_PM_QOS_CONFIG
tools/power/x86/intel-speed-select: Auto mode for CLX
tools/power/x86/intel-speed-select: Correct CLX-N frequency units
tools/power/x86/intel-speed-select: Change display of "avx" to "avx2"
tools/power/x86/intel-speed-select: Extend command set for perf-profile
Add touchscreen platform data for the Schneider SCT101CTM tablet
platform/x86: intel_int0002_vgpio: Pass irqchip when adding gpiochip
...
Linus Torvalds [Mon, 2 Dec 2019 02:20:54 +0000 (18:20 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- Support for Logitech G15 (Hans de Goede)
- HID parser improvements, improving support for some devices; e.g.
Windows Precision Touchpad, products from Primax, etc. (Blaž
Hrastnik, Candle Sun)
- robustification of tablet mode support in google-whiskers driver
(Dmitry Torokhov)
- assorted small fixes, device-specific quirks and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (23 commits)
HID: rmi: Check that the RMI_STARTED bit is set before unregistering the RMI transport device
HID: quirks: remove hid-led devices from hid_have_special_driver
HID: Improve Windows Precision Touchpad detection.
HID: i2c-hid: Reset ALPS touchpads on resume
HID: i2c-hid: fix no irq after reset on raydium 3118
HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
HID: i2c-hid: remove orphaned member sleep_delay
HID: quirks: Add quirk for HP MSU1465 PIXART OEM mouse
HID: core: check whether Usage Page item is after Usage ID items
HID: intel-ish-hid: Spelling s/diconnect/disconnect/
HID: google: Detect base folded usage instead of hard-coding whiskers
HID: logitech: Add depends on LEDS_CLASS to Logitech Kconfig entry
HID: lg-g15: Add support for the G510's M1-M3 and MR LEDs
HID: lg-g15: Add support for controlling the G510's RGB backlight
HID: lg-g15: Add support for the G510 keyboards' gaming keys
HID: lg-g15: Add support for the M1-M3 and MR LEDs
HID: lg-g15: Add keyboard and LCD backlight control
HID: Add driver for Logitech gaming keyboards (G15, G15 v2)
Input: Add event-codes for macro keys found on various keyboards
HID: hidraw: replace printk() with corresponding pr_xx() variant
...
Linus Torvalds [Mon, 2 Dec 2019 02:01:03 +0000 (18:01 -0800)]
Merge tag 'linux-watchdog-5.5-rc1' of git://linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- support for NCT6116D
- several small fixes and improvements
* tag 'linux-watchdog-5.5-rc1' of git://www.linux-watchdog.org/linux-watchdog: (24 commits)
watchdog: jz4740: Drop dependency on MACH_JZ47xx
watchdog: jz4740: Use regmap provided by TCU driver
watchdog: jz4740: Use WDT clock provided by TCU driver
dt-bindings: watchdog: sama5d4_wdt: add microchip,sam9x60-wdt compatible
watchdog: sama5d4_wdt: cleanup the bit definitions
watchdog: sprd: Fix the incorrect pointer getting from driver data
watchdog: aspeed: Fix clock behaviour for ast2600
watchdog: imx7ulp: Fix reboot hang
watchdog: make nowayout sysfs file writable
watchdog: prevent deferral of watchdogd wakeup on RT
watchdog: imx7ulp: Use definitions instead of magic values
watchdog: imx7ulp: Remove inline annotations
watchdog: imx7ulp: Remove unused structure member
watchdog: imx7ulp: Pass the wdog instance inimx7ulp_wdt_enable()
watchdog: wdat_wdt: Spelling s/configrable/configurable/
watchdog:
bd70528: Trivial function documentation fix
watchdog: cadence: Do not show error in case of deferred probe
watchdog: Fix the race between the release of watchdog_core_data and cdev
watchdog: sbc7240_wdt: Fix yet another -Wimplicit-fallthrough warning
watchdog: intel-mid_wdt: Add WATCHDOG_NOWAYOUT support
...
Linus Torvalds [Mon, 2 Dec 2019 01:56:50 +0000 (17:56 -0800)]
Merge tag 'gpio-v5.5-1' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v5.5 kernel cycle
Core changes:
- Expose pull up/down flags for the GPIO character device to
userspace.
After clear input from the RaspberryPi and Beagle communities, it
has been established that prototyping, industrial automation and
make communities strongly need this feature, and as we want people
to use the character device, we have implemented the simple pull
up/down interface for GPIO lines.
This means we can specify that a (chip-specific) pull up/down
resistor can be enabled, but does not offer fine-grained control
such as cases where the resistance of the same pull resistor can be
controlled (yet).
- Introduce devm_fwnode_gpiod_get_index() and start to phase out the
old symbol devm_fwnode_get_index_gpiod_from_child().
- A bit of documentation clean-up work.
- Introduce a define for GPIO line directions and deploy it in all
GPIO drivers in the drivers/gpio directory.
- Add a special callback to populate pin ranges when cooperating with
the pin control subsystem and registering ranges as part of adding
a gpiolib driver and a gpio_irq_chip driver at the same time. This
is also deployed in the Intel Merrifield driver.
New drivers:
- RDA Micro GPIO controller.
- XGS-iproc GPIO driver.
Driver improvements:
- Wake event and debounce support on the Tegra 186 driver.
- Finalize the Aspeed SGPIO driver.
- MPC8xxx uses a normal IRQ handler rather than a chained handler"
* tag 'gpio-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (64 commits)
gpio: Add TODO item for regmap helper
Documentation: gpio: driver.rst: Fix warnings
gpio: of: Fix bogus reference to gpiod_get_count()
gpiolib: Grammar s/manager/managed/
gpio: lynxpoint: Setup correct IRQ handlers
MAINTAINERS: Replace my email by one @kernel.org
gpiolib: acpi: Make acpi_gpiochip_alloc_event always return AE_OK
gpio/mpc8xxx: fix qoriq GPIO reading
gpio: mpc8xxx: Don't overwrite default irq_set_type callback
gpiolib: acpi: Print pin number on acpi_gpiochip_alloc_event errors
gpiolib: fix coding style in gpiod_hog()
drm/bridge: ti-tfp410: switch to using fwnode_gpiod_get_index()
gpio: merrifield: Pass irqchip when adding gpiochip
gpio: merrifield: Add GPIO <-> pin mapping ranges via callback
gpiolib: Introduce ->add_pin_ranges() callback
gpio: mmio: remove untrue leftover comment
gpio: em: Use platform_get_irq() to obtain interrupts
gpio: tegra186: Add debounce support
gpio: tegra186: Program interrupt route mapping
gpio: tegra186: Derive register offsets from bank/port
...
Linus Torvalds [Mon, 2 Dec 2019 00:16:31 +0000 (16:16 -0800)]
Merge tag 'mfd-next-5.5' of git://git./linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Core Frameworks:
- Add support for a "resource managed strongly uncachable ioremap"
call
- Provide a collection of MFD helper macros
- Remove mfd_clone_cell() from MFD core
- Add NULL de-reference protection in MFD core
- Remove superfluous function fd_platform_add_cell() from MFD core
- Honour Device Tree's request to disable a device
New Drivers:
- Add support for MediaTek MT6323 PMIC
New Device Support:
- Add support for Gemini Lake to Intel LPSS PCI
- Add support for Cherry Trail Crystal Cover PMIC to Intel SoC PMIC
CRC
- Add support for PM{I}8950 to Qualcomm SPMI PMIC
- Add support for U8420 to ST-Ericsson DB8500
- Add support for Comet Lake PCH-H to Intel LPSS PCI
New Functionality:
- Add support for requested supply clocks; madera-core
Fix-ups:
- Lower interrupt priority; rk808
- Use provided helpers (macros, group functions, defines); rk808,
ipaq-micro, ab8500-core, db8500-prcmu, mt6397-core, cs5535-mfd
- Only allocate IRQs on request; max77620
- Use simplified API; arizona-core
- Remove redundant and/or duplicated code; wm8998-tables, arizona,
syscon
- Device Tree binding fix-ups; madera, max77650, max77693
- Remove mfd_cell->id abuse hack; cs5535-mfd
- Remove only user of mfd_clone_cell(); cs5535-mfd
- Make resources static; rohm-
bd70528
Bug Fixes:
- Fix product ID for RK818; rk808
- Fix Power Key; rk808
- Fix booting on the BananaPi; mt6397-core
- Endian fix-ups; twl.h
- Fix static error checker warnings; ti_am335x_tscadc"
* tag 'mfd-next-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (47 commits)
Revert "mfd: syscon: Set name of regmap_config"
mfd: ti_am335x_tscadc: Fix static checker warning
mfd:
bd70528: Staticize bit value definitions
mfd: mfd-core: Honour Device Tree's request to disable a child-device
dt-bindings: mfd: max77693: Fix missing curly brace
mfd: intel-lpss: Add Intel Comet Lake PCH-H PCI IDs
mfd: db8500-prcmu: Support U8420-sysclk firmware
dt-bindings: mfd: max77650: Convert the binding document to yaml
mfd: mfd-core: Move pdev->mfd_cell creation back into mfd_add_device()
mfd: mfd-core: Remove usage counting for .{en,dis}able() call-backs
x86: olpc-xo1-sci: Remove invocation of MFD's .enable()/.disable() call-backs
x86: olpc-xo1-pm: Remove invocation of MFD's .enable()/.disable() call-backs
mfd: mfd-core: Remove mfd_clone_cell()
mfd: mfd-core: Protect against NULL call-back function pointer
mfd: cs5535-mfd: Register clients using their own dedicated MFD cell entries
mfd: cs5535-mfd: Request shared IO regions centrally
mfd: cs5535-mfd: Remove mfd_cell->id hack
mfd: cs5535-mfd: Use PLATFORM_DEVID_* defines and tidy error message
mfd: intel_soc_pmic_crc: Add "cht_crystal_cove_pmic" cell to CHT cells
mfd: madera: Add support for requesting the supply clocks
...
Linus Torvalds [Mon, 2 Dec 2019 00:13:39 +0000 (16:13 -0800)]
Merge tag 'backlight-next-5.5' of git://git./linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"New Functionality:
- Add support for an enable GPIO; lm3630a_bl
- Add support for short circuit handling; qcom-wled
- Add support for automatic string detection; qcom-wled
Fix-ups:
- Update Device Tree bindings; lm3630a-backlight, led-backlight,
qcom-wled
- Constify; ipaq_micro_bl
- Optimise for CPU cycles; pwm_bl
- Coding style fix-ups; pwm_bl
- Trivial fix-ups (white space, comments, renaming); pwm_bl,
gpio_backlight, qcom-wled
- Kconfig dependency hacking; LCD_HP700
- Rename, refactor and add peripherals; pm8941-wled => qcom-wled
- Make use of GPIO look-up tables; tosa_bl, tosa_lcd
- Remove superfluous code; gpio_backlight
- Adapt GPIO direction handling; gpio_backlight
- Remove legacy use of platform data; gpio_backlight
Bug Fixes:
- Provide modules aliases; lm3630a_bl"
* tag 'backlight-next-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: (32 commits)
backlight: qcom-wled: Fix spelling mistake "trigged" -> "triggered"
backlight: gpio: Pull gpio_backlight_initial_power_state() into probe
backlight: gpio: Use a helper variable for &pdev->dev
backlight: gpio: Remove unused fields from platform data
sh: ecovec24: don't set unused fields in platform data
backlight: gpio: Simplify the platform data handling
sh: ecovec24: add additional properties to the backlight device
backlight: gpio: Explicitly set the direction of the GPIO
backlight: gpio: Remove stray newline
backlight: gpio: Remove unneeded include
video: backlight: tosa: Use GPIO lookup table
backlight: qcom-wled: Add auto string detection logic
backlight: qcom-wled: Add support for short circuit handling
backlight: qcom-wled: Add support for WLED4 peripheral
backlight: qcom-wled: Restructure the driver for WLED3
backlight: qcom-wled: Rename PM8941* to WLED3
backlight: qcom-wled: Add new properties for PMI8998
backlight: qcom-wled: Restructure the qcom-wled bindings
backlight: qcom-wled: Rename pm8941-wled.c to qcom-wled.c
dt-bindings: backlight: lm3630a: Fix missing include
...
Linus Torvalds [Mon, 2 Dec 2019 00:12:21 +0000 (16:12 -0800)]
Merge tag 'pinctrl-v5.5-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl fix from Linus Walleij:
"A oneliner fix adding the license to the new Intel pin controller,
avoiding a build-time warning"
* tag 'pinctrl-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: Fix warning by adding missing MODULE_LICENSE
Linus Torvalds [Mon, 2 Dec 2019 00:09:28 +0000 (16:09 -0800)]
Merge tag 'leds-5.5-rc1' of git://git./linux/kernel/git/pavel/linux-leds
Pull LED updates from Pavel Machek:
"This contains usual small updates to drivers, and removal of PAGE_SIZE
limits on /sys/class/leds/<led>/trigger.
We should not be really having that many triggers; but with cpu
activity triggers we do, and we'll eventually need to fix it, but...
remove the limit for now"
* tag 'leds-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (26 commits)
leds: trigger: netdev: fix handling on interface rename
leds: an30259a: add a check for devm_regmap_init_i2c
leds: mlxreg: Fix possible buffer overflow
leds: pca953x: Use of_device_get_match_data()
leds: core: Fix leds.h structure documentation
leds: core: Fix devm_classdev_match to reference correct structure
leds: core: Remove extern from header
leds: lm3601x: Convert class registration to device managed
leds: flash: Add devm_* functions to the flash class
leds: flash: Remove extern from the header file
leds: flash: Convert non extended registration to inline
leds: Kconfig: Be consistent with the usage of "LED"
leds: remove PAGE_SIZE limit of /sys/class/leds/<led>/trigger
leds: tlc591xx: update the maximum brightness
leds: lm3692x: Use flags from LM3692X_BRT_CTRL
leds: lm3692x: Use flags from LM3692X_BOOST_CTRL
leds: lm3692x: Handle failure to probe the regulator
leds: lm3692x: Don't overwrite return value in error path
leds: lm3692x: Print error value on dev_err
leds: tlc591xx: use devm_led_classdev_register_ext()
...
Linus Torvalds [Mon, 2 Dec 2019 00:06:02 +0000 (16:06 -0800)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"This merge window we have one small clk provider API in the core
framework and then a bunch of driver updates and a handful of new
drivers. In terms of diffstat the Qualcomm and Amlogic drivers are
high up there because of all the clk data introcued by new drivers.
The Nvidia Tegra driver had a lot of work done this cycle too to
support suspend/resume and memory controllers. And the OMAP clk driver
got proper clk and reset handling in place.
Rounding out the patches are various updates to remove unused data,
mark things static, correct incorrect data in drivers, etc. All the
little things that improve drivers and maintain code health. I will
point out that there's a patch in here for the GPIO clk driver, that
almost nobody uses, which changes behavior and causes clk_set_rate()
to try to change the GPIO gate clk's parent. Other than that things
are fairly well SoC specific here.
Core:
- Add a clk provider API to get current parent index
- Plug a memory leak in clk_unregister() path
New Drivers:
- CGU in Ingenix X1000
- Bitmain BM1880 clks
- Qualcomm MSM8998 GPU clk controllers
- Qualcomm SC7180 GCC and RPMH clk controllers
- Qualcomm QCS404 Q6SSTOP clk controllers
- Add support for the Renesas R-Car M3-W+ (r8a77961) SoC
- Add support for the Renesas RZ/G2N (r8a774b1) SoC
- Add Tegra20/30 External Memory Clock (EMC) support
Updates:
- Make gpio gate clks propagate rate setting up to parent
- Prepare Armada 3700 for suspend to RAM by moving PCIe
suspend/resume priority
- Drop unused variables, enums, etc. in various clk drivers
- Convert various drivers to use devm_platform_ioremap_resource()
- Use struct_size() some more in various clk drivers
- Improve Rockchip px30 clk tree
- Add suspend/resume support to Tegra210 clk driver
- Reimplement SOR clks on earlier Tegra SoCs, helping HDMI and DP
- Allwinner DT exports and H6 clk tree fixes
- Proper clk and reset handling for OMAP SoCs
- Revamped TI divider clk to clamp max divider
- Make 1443X/1416X PLL clock structure common for reusing among i.MX8
SoCs
- Drop IMX7ULP_CLK_MIPI_PLL clock, it shouldn't be used
- Add VIDEO2_PLL clock for imx8mq
- Add missing gate clock for pll1/2 fixed dividers on i.MX8 SoCs
- Add sm1 support in the Amlogic audio clock controller
- Switch some clocks on R-Car Gen2/3 to .determine_rate()
- Remove Renesas R-Car Gen2 legacy DT clock support
- Improve arithmetic divisions on Renesas R-Car Gen2 and Gen3
- Improve Renesas R-Car Gen3 SD clock handling
- Add rate table for Samsung exynos542x GPU and VPLL clks
- Fix potential CPU performance degradation after system
suspend/resume cycle on exynos542x SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (160 commits)
clk: aspeed: Add RMII RCLK gates for both AST2500 MACs
MAINTAINERS: Add entry for BM1880 SoC clock driver
clk: Add common clock driver for BM1880 SoC
dt-bindings: clock: Add devicetree binding for BM1880 SoC
clk: Add clk_hw_unregister_composite helper function definition
clk: Zero init clk_init_data in helpers
clk: ingenic: Allow drivers to be built with COMPILE_TEST
MAINTAINERS: Update section for Ux500 clock drivers
clk: mark clk_disable_unused() as __init
clk: Fix memory leak in clk_unregister()
clk: Ingenic: Add CGU driver for X1000.
dt-bindings: clock: Add X1000 bindings.
clk: tegra: Use match_string() helper to simplify the code
clk: pxa: fix one of the pxa RTC clocks
clk: sprd: Use IS_ERR() to validate the return value of syscon_regmap_lookup_by_phandle()
clk: armada-xp: remove unused code
clk: tegra: Fix build error without CONFIG_PM_SLEEP
clk: tegra: Add missing stubs for the case of !CONFIG_PM_SLEEP
clk: tegra: Optimize PLLX restore on Tegra20/30
clk: tegra: Add suspend and resume support on Tegra210
...
Linus Torvalds [Sun, 1 Dec 2019 22:00:59 +0000 (14:00 -0800)]
Merge tag 'y2038-cleanups-5.5' of git://git./linux/kernel/git/arnd/playground
Pull y2038 cleanups from Arnd Bergmann:
"y2038 syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended for
namespace cleaning: the kernel defines the traditional time_t, timeval
and timespec types that often lead to y2038-unsafe code. Even though
the unsafe usage is mostly gone from the kernel, having the types and
associated functions around means that we can still grow new users,
and that we may be missing conversions to safe types that actually
matter.
There are still a number of driver specific patches needed to get the
last users of these types removed, those have been submitted to the
respective maintainers"
Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
y2038: alarm: fix half-second cut-off
y2038: ipc: fix x32 ABI breakage
y2038: fix typo in powerpc vdso "LOPART"
y2038: allow disabling time32 system calls
y2038: itimer: change implementation to timespec64
y2038: move itimer reset into itimer.c
y2038: use compat_{get,set}_itimer on alpha
y2038: itimer: compat handling to itimer.c
y2038: time: avoid timespec usage in settimeofday()
y2038: timerfd: Use timespec64 internally
y2038: elfcore: Use __kernel_old_timeval for process times
y2038: make ns_to_compat_timeval use __kernel_old_timeval
y2038: socket: use __kernel_old_timespec instead of timespec
y2038: socket: remove timespec reference in timestamping
y2038: syscalls: change remaining timeval to __kernel_old_timeval
y2038: rusage: use __kernel_old_timeval
y2038: uapi: change __kernel_time_t to __kernel_old_time_t
y2038: stat: avoid 'time_t' in 'struct stat'
y2038: ipc: remove __kernel_time_t reference from headers
y2038: vdso: powerpc: avoid timespec references
...
Linus Torvalds [Sun, 1 Dec 2019 21:46:15 +0000 (13:46 -0800)]
Merge tag 'compat-ioctl-5.5' of git://git./linux/kernel/git/arnd/playground
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.
In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"
* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...
Linus Torvalds [Sun, 1 Dec 2019 21:26:18 +0000 (13:26 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull sysctl system call removal from Eric Biederman:
"As far as I can tell we have reached the point where no one enables
the sysctl system call anymore. It still is enabled in a few
defconfigs but they are mostly the rarely used one and in asking
people about that it was more cut & paste enabled than anything else.
This is single commit that just deletes code. Leaving just enough code
so that the deprecated sysctl warning continues to be printed. If my
analysis turns out to be wrong and someone actually cares it will be
easy to revert this commit and have the system call again.
There was one new xtensa defconfig in linux-next that enabled the
system call this cycle and when asked about it the maintainer of the
code replied that it was not enabled on purpose. As of today's
linux-next tree that defconfig no longer enables the system call.
What we saw in the review discussion was that if we go a step farther
than my patch and mess with uapi headers there are pieces of code that
won't compile, but nothing minds the system call actually disappearing
from the kernel"
Link: https://lore.kernel.org/lkml/201910011140.EA0181F13@keescook/
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Remove the sysctl system call
David S. Miller [Sun, 1 Dec 2019 21:21:24 +0000 (13:21 -0800)]
Merge branch 'openvswitch-remove-a-couple-of-BUG_ON'
Paolo Abeni says:
====================
openvswitch: remove a couple of BUG_ON()
The openvswitch kernel datapath includes some BUG_ON() statements to check
for exceptional/unexpected failures. These patches drop a couple of them,
where we can do that without introducing other side effects.
v1 -> v2:
- avoid memory leaks on error path
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Sun, 1 Dec 2019 17:41:25 +0000 (18:41 +0100)]
openvswitch: remove another BUG_ON()
If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.
Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.
v1 -> v2:
- do not leak an skb on error
Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Sun, 1 Dec 2019 17:41:24 +0000 (18:41 +0100)]
openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.
v1 -> v2:
- clarify the commit message
- clean the skb and report the error (DaveM)
Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 1 Dec 2019 09:51:47 +0000 (10:51 +0100)]
net: phy: realtek: fix using paged operations with RTL8105e / RTL8208
It was reported [0] that since the referenced commit a warning is
triggered in phylib that complains about paged operations being used
with a PHY driver that doesn't support this. The commit isn't wrong,
just for one chip version (RTL8105e) no dedicated PHY driver exists
yet. So add the missing PHY driver.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=202103
Fixes: 3a129e3f9ac4 ("r8169: switch to phylib functions in more places")
Reported-by: jhdskag3 <jhdskag3@tutanota.com>
Tested-by: jhdskag3 <jhdskag3@tutanota.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 1 Dec 2019 09:39:56 +0000 (10:39 +0100)]
r8169: fix resume on cable plug-in
It was reported [0] that network doesn't wake up on cable plug-in with
certain chip versions. Reason is that on these chip versions the PHY
doesn't detect cable plug-in when being in power-down mode. So prevent
the PHY from powering down if WoL is enabled.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=202103
Fixes: 95fb8bb3181b ("net: phy: force phy suspend when calling phy_stop")
Reported-by: jhdskag3 <jhdskag3@tutanota.com>
Tested-by: jhdskag3 <jhdskag3@tutanota.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 1 Dec 2019 09:27:14 +0000 (10:27 +0100)]
r8169: fix jumbo configuration for RTL8168evl
Alan reported [0] that network is broken since the referenced commit
when using jumbo frames. This commit isn't wrong, it just revealed
another issue that has been existing before. According to the vendor
driver the RTL8168e-specific jumbo config doesn't apply for RTL8168evl.
[0] https://lkml.org/lkml/2019/11/30/119
Fixes: 4ebcb113edcc ("r8169: fix jumbo packet handling on resume from suspend")
Reported-by: Alan J. Wylie <alan@wylie.me.uk>
Tested-by: Alan J. Wylie <alan@wylie.me.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minchan Kim [Sun, 1 Dec 2019 01:58:29 +0000 (17:58 -0800)]
mm/page_io.c: annotate refault stalls from swap_readpage
If a block device supports rw_page operation, it doesn't submit bios so
the annotation in submit_bio() for refault stall doesn't work. It
happens with zram in android, especially swap read path which could
consume CPU cycle for decompress. It is also a problem for zswap which
uses frontswap.
Annotate swap_readpage() to account the synchronous IO overhead to
prevent underreport memory pressure.
[akpm@linux-foundation.org: add comment, per Johannes]
Link: http://lkml.kernel.org/r/20191010152134.38545-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sun, 1 Dec 2019 01:58:26 +0000 (17:58 -0800)]
mm/Kconfig: fix trivial help text punctuation
End a Kconfig help text sentence with a period (aka full stop).
Link: http://lkml.kernel.org/r/c17f2c75-dc2a-42a4-2229-bb6b489addf2@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Kozlowski [Sun, 1 Dec 2019 01:58:23 +0000 (17:58 -0800)]
mm/Kconfig: fix indentation
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ / /' -i */Kconfig
Link: http://lkml.kernel.org/r/1574306437-28837-1-git-send-email-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Souptick Joarder [Sun, 1 Dec 2019 01:58:20 +0000 (17:58 -0800)]
mm/memory_hotplug.c: remove __online_page_set_limits()
__online_page_set_limits() is a dummy function - remove it and all
callers.
Link: http://lkml.kernel.org/r/8e1bc9d3b492f6bde16e95ebc1dee11d6aefabd7.1567889743.git.jrdr.linux@gmail.com
Link: http://lkml.kernel.org/r/854db2cf8145d9635249c95584d9a91fd774a229.1567889743.git.jrdr.linux@gmail.com
Link: http://lkml.kernel.org/r/9afe6c5a18158f3884a6b302ac2c772f3da49ccc.1567889743.git.jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:58:17 +0000 (17:58 -0800)]
mm: fix typos in comments when calling __SetPageUptodate()
There are several places emphasise the effect of __SetPageUptodate(),
while the comment seems to have a typo in two places.
Link: http://lkml.kernel.org/r/20190926023705.7226-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hao Lee [Sun, 1 Dec 2019 01:58:14 +0000 (17:58 -0800)]
mm: fix struct member name in function comments
The member in struct zonelist is _zonerefs instead of zones.
Link: http://lkml.kernel.org/r/20190927144049.GA29622@haolee.github.io
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Jun [Sun, 1 Dec 2019 01:58:11 +0000 (17:58 -0800)]
mm/shmem.c: cast the type of unmap_start to u64
In 64bit system. sb->s_maxbytes of shmem filesystem is MAX_LFS_FILESIZE,
which equal LLONG_MAX.
If offset > LLONG_MAX - PAGE_SIZE, offset + len < LLONG_MAX in
shmem_fallocate, which will pass the checking in vfs_fallocate.
/* Check for wrap through zero too */
if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
return -EFBIG;
loff_t unmap_start = round_up(offset, PAGE_SIZE) in shmem_fallocate
causes a overflow.
Syzkaller reports a overflow problem in mm/shmem:
UBSAN: Undefined behaviour in mm/shmem.c:2014:10
signed integer overflow: '
9223372036854775807 + 1' cannot be represented in type 'long long int'
CPU: 0 PID:17076 Comm: syz-executor0 Not tainted 4.1.46+ #1
Hardware name: linux, dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2c8 arch/arm64/kernel/traps.c:100
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:238
__dump_stack lib/dump_stack.c:15 [inline]
ubsan_epilogue+0x18/0x70 lib/ubsan.c:164
handle_overflow+0x158/0x1b0 lib/ubsan.c:195
shmem_fallocate+0x6d0/0x820 mm/shmem.c:2104
vfs_fallocate+0x238/0x428 fs/open.c:312
SYSC_fallocate fs/open.c:335 [inline]
SyS_fallocate+0x54/0xc8 fs/open.c:239
The highest bit of unmap_start will be appended with sign bit 1
(overflow) when calculate shmem_falloc.start:
shmem_falloc.start = unmap_start >> PAGE_SHIFT.
Fix it by casting the type of unmap_start to u64, when right shifted.
This bug is found in LTS Linux 4.1. It also seems to exist in mainline.
Link: http://lkml.kernel.org/r/1573867464-5107-1-git-send-email-chenjun102@huawei.com
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sun, 1 Dec 2019 01:58:07 +0000 (17:58 -0800)]
mm: shmem: use proper gfp flags for shmem_writepage()
The shmem_writepage() uses GFP_ATOMIC to allocate swap cache. GFP_ATOMIC
used to mean __GFP_HIGH, but now it means __GFP_HIGH | __GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM. However, shmem_writepage() should write out to swap
only in response to memory pressure, so __GFP_KSWAPD_RECLAIM looks useless
since the caller may be kswapd itself or in direct reclaim already.
In addition, XArray node allocations from PF_MEMALLOC contexts could
completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
reserves from being allocated.
Here just copy the gfp flags used by add_to_swap().
Hugh:
"a cleanup to make the two calls look the same when they don't need to
be different (whereas the call from __read_swap_cache_async() rightly
uses a lower priority gfp)".
Link: http://lkml.kernel.org/r/1572991351-86061-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Sun, 1 Dec 2019 01:58:04 +0000 (17:58 -0800)]
mm/shmem.c: make array 'values' static const, makes object smaller
Don't populate the array 'values' on the stack but instead make it static
const. Makes the object code smaller by 111 bytes.
Before:
text data bss dec hex filename
108612 11169 512 120293 1d5e5 mm/shmem.o
After:
text data bss dec hex filename
108437 11233 512 120182 1d576 mm/shmem.o
(gcc version 9.2.1, amd64)
Link: http://lkml.kernel.org/r/20190906143012.28698-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Sun, 1 Dec 2019 01:58:01 +0000 (17:58 -0800)]
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
A while ago Andy noticed
(http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
security implications.
As the first step of the solution the following patch limits the availably
of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.
The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.
Yet, if there are other users of non-cooperative userfaultfd that run
without CAP_SYS_PTRACE, they would be broken :(
Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
descriptor table from the read() implementation of uffd, which may have
security implications for unprivileged use of the userfaultfd.
Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
CAP_SYS_PTRACE.
Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Nosh Minwalla <nosh@google.com>
Cc: Pavel Emelyanov <ovzxemul@gmail.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Sun, 1 Dec 2019 01:57:58 +0000 (17:57 -0800)]
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
need to be cleared. Currently setting UFFDIO_REGISTER_MODE_WP returns
-EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
is applied.
Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:55 +0000 (17:57 -0800)]
userfaultfd: wrap the common dst_vma check into an inlined function
When doing UFFDIO_COPY, it is necessary to find the correct destination
vma and make sure fault range is in it.
Since there are two places need to do the same task, just wrap those
common check into an inlined function.
Link: http://lkml.kernel.org/r/20190927070032.2129-3-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:52 +0000 (17:57 -0800)]
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
These warning here is to make sure address(dst_addr) and length(len -
copied) are huge page size aligned.
While this is ensured by:
dst_start and len is huge page size aligned
dst_addr equals to dst_start and increase huge page size each time
copied increase huge page size each time
This means these warnings will never be triggered.
Link: http://lkml.kernel.org/r/20190927070032.2129-2-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:49 +0000 (17:57 -0800)]
userfaultfd: use vma_pagesize for all huge page size calculation
In __mcopy_atomic_hugetlb() we use two variables to deal with huge page
size: vma_hpagesize and huge_page_size.
Since they are the same, it is not necessary to use two different
mechanism. This patch makes it consistent by all using vma_hpagesize.
Link: http://lkml.kernel.org/r/20190927070032.2129-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:46 +0000 (17:57 -0800)]
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
Improve readability, no functional change.
Link: http://lkml.kernel.org/r/20191118032857.22683-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yunfeng Ye [Sun, 1 Dec 2019 01:57:42 +0000 (17:57 -0800)]
mm/madvise.c: replace with page_size() in madvise_inject_error()
page_size() is supported after the commit
a50b854e073c ("mm: introduce
page_size()").
Use page_size() in madvise_inject_error() for readability.
[akpm@linux-foundation.org: use ulong for `size', per David]
Link: http://lkml.kernel.org/r/29dce60c-38d6-0220-f292-e298f0c78c4d@huawei.com
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:39 +0000 (17:57 -0800)]
mm/mmap.c: make vma_merge() comment more easy to understand
Case 1/6, 2/7 and 3/8 have the same pattern and we handle them in the
same logic.
Rearrange the comment to make it a little easy for audience to
understand.
Link: http://lkml.kernel.org/r/20191030012445.16944-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhong jiang [Sun, 1 Dec 2019 01:57:35 +0000 (17:57 -0800)]
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file
operation rather than DEFINE_SIMPLE_ATTRIBUTE.
Link: http://lkml.kernel.org/r/1572403660-44718-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying [Sun, 1 Dec 2019 01:57:32 +0000 (17:57 -0800)]
autonuma: reduce cache footprint when scanning page tables
In auto NUMA balancing page table scanning, if the pte_protnone() is
true, the PTE needs not to be changed because it's in target state
already. So other checking on corresponding struct page is unnecessary
too.
So, if we check pte_protnone() firstly for each PTE, we can avoid
unnecessary struct page accessing, so that reduce the cache footprint of
NUMA balancing page table scanning.
In the performance test of pmbench memory accessing benchmark with 80:20
read/write ratio and normal access address distribution on a 2 socket
Intel server with Optance DC Persistent Memory, perf profiling shows
that the autonuma page table scanning time reduces from 1.23% to 0.97%
(that is, reduced 21%) with the patch.
Link: http://lkml.kernel.org/r/20191101075727.26683-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying [Sun, 1 Dec 2019 01:57:28 +0000 (17:57 -0800)]
autonuma: fix watermark checking in migrate_balanced_pgdat()
When zone_watermark_ok() is called in migrate_balanced_pgdat() to check
migration target node, the parameter classzone_idx (for requested zone)
is specified as 0 (ZONE_DMA). But when allocating memory for autonuma
in alloc_misplaced_dst_page(), the requested zone from GFP flags is
ZONE_MOVABLE. That is, the requested zone is different. The size of
lowmem_reserve for the different requested zone is different. And this
may cause some issues.
For example, in the zoneinfo of a test machine as below,
Node 0, zone DMA32
pages free 61592
min 29
low 454
high 879
spanned
1044480
present 442306
managed 425921
protection: (0, 0, 62457, 62457, 62457)
The free page number of ZONE_DMA32 is greater than "high watermark +
lowmem_reserve[ZONE_DMA]", but less than "high watermark +
lowmem_reserve[ZONE_MOVABLE]". And because __alloc_pages_node() in
alloc_misplaced_dst_page() requests ZONE_MOVABLE, the
zone_watermark_ok() on ZONE_DMA32 in migrate_balanced_pgdat() may always
return true. So, autonuma may not stop even when memory pressure in
node 0 is heavy.
To fix the issue, ZONE_MOVABLE is used as parameter to call
zone_watermark_ok() in migrate_balanced_pgdat(). This makes it same as
requested zone in alloc_misplaced_dst_page(). So that
migrate_balanced_pgdat() returns false when memory pressure is heavy.
Link: http://lkml.kernel.org/r/20191101075727.26683-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhong jiang [Sun, 1 Dec 2019 01:57:25 +0000 (17:57 -0800)]
mm/cma_debug.c: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file
operation rather than DEFINE_SIMPLE_ATTRIBUTE.
Link: http://lkml.kernel.org/r/1572348687-9951-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Yue Hu <huyue2@yulong.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yunfeng Ye [Sun, 1 Dec 2019 01:57:22 +0000 (17:57 -0800)]
mm/cma.c: switch to bitmap_zalloc() for cma bitmap allocation
kzalloc() is used for cma bitmap allocation in cma_activate_area(),
switch to bitmap_zalloc() for clarity.
Link: http://lkml.kernel.org/r/895d4627-f115-c77a-d454-c0a196116426@huawei.com
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Yue Hu <huyue2@yulong.com>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ryohei Suzuki <ryh.szk.cmnty@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Song Liu [Sun, 1 Dec 2019 01:57:19 +0000 (17:57 -0800)]
mm/thp: flush file for !is_shmem PageDirty() case in collapse_file()
For non-shmem file THPs, khugepaged only collapses read only .text
mapping (VM_DENYWRITE). These pages should not be dirty except the case
where the file hasn't been flushed since first write.
Call filemap_flush() in collapse_file() to accelerate the write back in
such cases.
Link: http://lkml.kernel.org/r/20191106060930.2571389-3-songliubraving@fb.com
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Sun, 1 Dec 2019 01:57:15 +0000 (17:57 -0800)]
mm, thp: do not queue fully unmapped pages for deferred split
Adding fully unmapped pages into deferred split queue is not productive:
these pages are about to be freed or they are pinned and cannot be split
anyway.
Link: http://lkml.kernel.org/r/20190913091849.11151-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sun, 1 Dec 2019 01:57:12 +0000 (17:57 -0800)]
mm/migrate.c: handle freed page at the first place
When doing migration if the freed page is met, we just return without
migrating it since it is pointless to migrate a freed page. But, the
current code allocates target page unconditionally before handling freed
page, if the page is freed, the newly allocated will be just freed. It
doesn't make too much sense and is just a waste of time although
migrating freed page is rare.
So, handle freed page at the before that to avoid unnecessary page
allocation and free.
Link: http://lkml.kernel.org/r/1573755869-106954-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhong jiang [Sun, 1 Dec 2019 01:57:09 +0000 (17:57 -0800)]
mm/huge_memory.c: split_huge_pages_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE
split_huge_pages_fops is used for debugfs file. hence, it is more clear
to use DEFINE_DEBUGFS_ATTRIBUTE.
Link: http://lkml.kernel.org/r/1572347674-8111-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhigang Lu [Sun, 1 Dec 2019 01:57:06 +0000 (17:57 -0800)]
mm/hugetlb: avoid looping to the same hugepage if !pages and !vmas
When mmapping an existing hugetlbfs file with MAP_POPULATE, we find it
is very time consuming. For example, mmapping a 128GB file takes about
50 milliseconds. Sampling with perfevent shows it spends 99% time in
the same_page loop in follow_hugetlb_page().
samples: 205 of event 'cycles', Event count (approx.):
136686374
- 99.04% test_mmap_huget [kernel.kallsyms] [k] follow_hugetlb_page
follow_hugetlb_page
__get_user_pages
__mlock_vma_pages_range
__mm_populate
vm_mmap_pgoff
sys_mmap_pgoff
sys_mmap
system_call_fastpath
__mmap64
follow_hugetlb_page() is called with pages=NULL and vmas=NULL, so for
each hugepage, we run into the same_page loop for pages_per_huge_page()
times, but doing nothing. With this change, it takes less then 1
millisecond to mmap a 128GB file in hugetlbfs.
Link: http://lkml.kernel.org/r/1567581712-5992-1-git-send-email-totty.lu@gmail.com
Signed-off-by: Zhigang Lu <tonnylu@tencent.com>
Reviewed-by: Haozhong Zhang <hzhongzhang@tencent.com>
Reviewed-by: Zongming Zhang <knightzhang@tencent.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sun, 1 Dec 2019 01:57:02 +0000 (17:57 -0800)]
hugetlb: remove unused hstate in hugetlb_fault_mutex_hash()
The first parameter hstate in function hugetlb_fault_mutex_hash() is not
used anymore.
This patch removes it.
[akpm@linux-foundation.org: various build fixes]
[cai@lca.pw: fix a GCC compilation warning]
Link: http://lkml.kernel.org/r/1570544108-32331-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20191005003302.785-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Sun, 1 Dec 2019 01:56:59 +0000 (17:56 -0800)]
hugetlb: remove duplicated code
Remove duplicated code between region_chg and region_add, and refactor
it into a common function, add_reservation_in_range. This is mostly
done because there is a follow up change in another series that disables
region coalescing in region_add, and I want to make that change in one
place only. It should improve maintainability anyway on its own.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20190919200428.188797-3-almasrymina@google.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Sun, 1 Dec 2019 01:56:54 +0000 (17:56 -0800)]
hugetlb: region_chg provides only cache entry
Current behavior is that region_chg provides both a cache entry in
resv->region_cache, AND a placeholder entry in resv->regions.
region_add first tries to use the placeholder, and if it finds that the
placeholder has been deleted by a racing region_del call, it uses the
cache entry.
This behavior is completely unnecessary and is removed in this patch for
a couple of reasons:
1. region_add needs to either find a cached file_region entry in
resv->region_cache, or find an entry in resv->regions to expand. It
does not need both.
2. region_chg adding a placeholder entry in resv->regions opens up
a possible race with region_del, where region_chg adds a placeholder
region in resv->regions, and this region is deleted by a racing call
to region_del during region_chg execution or before region_add is
called. Removing the race makes the code easier to reason about and
maintain.
In addition, a follow up patch in another series that disables region
coalescing, which would be further complicated if the race with
region_del exists.
Link: http://lkml.kernel.org/r/20190919200428.188797-2-almasrymina@google.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Waiman Long [Sun, 1 Dec 2019 01:56:49 +0000 (17:56 -0800)]
hugetlbfs: take read_lock on i_mmap for PMD sharing
A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are
experiencing random multisecond delays. These delays were caused by the
long time it took to scan the VMA interval tree with mmap_sem held.
The sharing of huge PMD does not require changes to the i_mmap at all.
Therefore, we can just take the read lock and let other threads
searching for the right VMA share it in parallel. Once the right VMA is
found, either the PMD lock (2M huge page for x86-64) or the
mm->page_table_lock will be acquired to perform the actual PMD sharing.
Lock contention, if present, will happen in the spinlock. That is much
better than contention in the rwsem where the time needed to scan the
the interval tree is indeterminate.
With this patch applied, the customer is seeing significant performance
improvement over the unpatched kernel.
Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Piotr Sarna [Sun, 1 Dec 2019 01:56:43 +0000 (17:56 -0800)]
hugetlbfs: add O_TMPFILE support
With hugetlbfs, a common pattern for mapping anonymous huge pages is to
create a temporary file first. Currently libraries like libhugetlbfs
and seastar create these with a standard mkstemp+unlink trick, but it
would be more robust to be able to simply pass the O_TMPFILE flag to
open(). O_TMPFILE is already supported by several file systems like
ext4 and xfs. The implementation simply uses the existi= ng d_tmpfile
utility function to instantiate the dcache entry for the file.
Tested manually by successfully creating a temporary file by opening it
with (O_TMPFILE|O_RDWR) on mounted hugetlbfs and successfully mapping 2M
huge pages with it. Without the patch, trying to open a file with
O_TMPFILE results in -ENOSUP.
Link: http://lkml.kernel.org/r/bc9383eff6e1374d79f3a92257ae829ba1e6ae60.1573285189.git.p.sarna@tlen.pl
Signed-off-by: Piotr Sarna <p.sarna@tlen.pl>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>