Eric Dumazet [Fri, 15 Apr 2016 17:47:52 +0000 (10:47 -0700)]
net: bcmgenet: device stats are unsigned long
On 64bit kernels, device stats are 64bit wide, not 32bit.
Fixes: 1c1008c793fa4 ("net: bcmgenet: add main driver file")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 16 Apr 2016 23:07:11 +0000 (19:07 -0400)]
Merge branch 'dsa-mv88e6xxx-fix-cross-chip-bridging'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: fix hardware cross-chip bridging
In order to accelerate cross-chip switching of frames with the hardware,
the DSA Tag ports, used to interconnect switch devices, must learn SA
and DA addresses, and share the same FDB with the user ports.
The two first patches restore address learning on DSA links. This fixes
hardware cross-chip bridging in a VLAN filtering enabled system, which
implements a bridge group as a 802.1Q VLAN and thus share an isolated
address database between DSA and user ports.
The third patch changes the distinct default databases used for each
port, to the same address database. This fixes the hardware cross-chip
bridging in a VLAN filtering disabled system, where a bridge group gets
implemented only as a port-based VLAN.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 14 Apr 2016 18:42:09 +0000 (14:42 -0400)]
net: dsa: mv88e6xxx: share the same default FDB
For hardware cross-chip bridging to work, user ports *and* DSA ports
need to share a common address database, in order to switch a frame to
the correct interconnected device.
This is currently working for VLAN filtering aware systems, since Linux
will implement a bridge group as a 802.1Q VLAN, which has its own FDB,
including DSA and CPU links as members.
However when the system doesn't support VLAN filtering, Linux only
relies on the port-based VLAN to implement a bridge group.
To fix hardware cross-chip bridging for such systems, set the same
default address database 0 for user and DSA ports, instead of giving
them all a different default database.
Note that the bridging code prevents frames to egress between unbridged
ports, and flushes FDB entries of a port when changing its STP state.
Also note that the FID 0 is special and means "all" for ATU operations,
but it's OK since it is used as a default forwarding address database.
Fixes: 2db9ce1fd9a3 ("net: dsa: mv88e6xxx: assign default FDB to ports")
Fixes: 466dfa077022 ("net: dsa: mv88e6xxx: assign dynamic FDB to bridges")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 14 Apr 2016 18:42:08 +0000 (14:42 -0400)]
net: dsa: mv88e6xxx: enable SA learning on DSA ports
In multi-chip systems, DSA Tag ports must learn SA addresses in order to
correctly switch frames between interconnected chips.
This fixes cross-chip hardware bridging in a VLAN filtering aware
system, because a bridge group gets implemented as an hardware 802.1Q
VLAN and thus DSA and user ports share the same FDB.
Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 14 Apr 2016 18:42:07 +0000 (14:42 -0400)]
net: dsa: mv88e6xxx: unlock DSA and CPU ports
Locking a port generates an hardware interrupt when a new SA address is
received. This enables CPU directed learning, which is needed for 802.1X
MAC authentication.
To disable automatic learning on a port, the only configuration needed
is to set its Port Association Vector to all zero.
Clear PAV when SA learning should be disabled instead of locking a port.
Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Thu, 14 Apr 2016 17:43:27 +0000 (10:43 -0700)]
RDS: Fix the atomicity for congestion map update
Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.
Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.
Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qing Huang [Thu, 14 Apr 2016 17:43:26 +0000 (10:43 -0700)]
RDS: fix endianness for dp_ack_seq
dp->dp_ack_seq is used in big endian format. We need to do the
big endianness conversion when we assign a value in host format
to it.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 16 Apr 2016 00:27:58 +0000 (02:27 +0200)]
vlan: pull on __vlan_insert_tag error path and fix csum correction
When __vlan_insert_tag() fails from skb_vlan_push() path due to the
skb_cow_head(), we need to undo the __skb_push() in the error path
as well that was done earlier to move skb->data pointer to mac header.
Moreover, I noticed that when in the non-error path the __skb_pull()
is done and the original offset to mac header was non-zero, we fixup
from a wrong skb->data offset in the checksum complete processing.
So the skb_postpush_rcsum() really needs to be done before __skb_pull()
where skb->data still points to the mac header start and thus operates
under the same conditions as in __vlan_insert_tag().
Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Goodbody [Wed, 13 Apr 2016 14:36:48 +0000 (15:36 +0100)]
cpsw: Prevent NUll pointer dereference with two PHYs
Adding a 2nd PHY to cpsw results in a NULL pointer dereference
as below. Fix by maintaining a reference to each PHY node in slave
struct instead of a single reference in the priv struct which was
overwritten by the 2nd PHY.
[ 17.870933] Unable to handle kernel NULL pointer dereference at virtual address
00000180
[ 17.879557] pgd =
dc8bc000
[ 17.882514] [
00000180] *pgd=
9c882831, *pte=
00000000, *ppte=
00000000
[ 17.889213] Internal error: Oops: 17 [#1] ARM
[ 17.893838] Modules linked in:
[ 17.897102] CPU: 0 PID: 1657 Comm: connmand Not tainted
4.5.0-ge463dfb-dirty #11
[ 17.904947] Hardware name: Cambrionix whippet
[ 17.909576] task:
dc859240 ti:
dc968000 task.ti:
dc968000
[ 17.915339] PC is at phy_attached_print+0x18/0x8c
[ 17.920339] LR is at phy_attached_info+0x14/0x18
[ 17.925247] pc : [<
c042baec>] lr : [<
c042bb74>] psr:
600f0113
[ 17.925247] sp :
dc969cf8 ip :
dc969d28 fp :
dc969d18
[ 17.937425] r10:
dda7a400 r9 :
00000000 r8 :
00000000
[ 17.942971] r7 :
00000001 r6 :
ddb00480 r5 :
ddb8cb34 r4 :
00000000
[ 17.949898] r3 :
c0954cc0 r2 :
c09562b0 r1 :
00000000 r0 :
00000000
[ 17.956829] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 17.964401] Control:
10c5387d Table:
9c8bc019 DAC:
00000051
[ 17.970500] Process connmand (pid: 1657, stack limit = 0xdc968210)
[ 17.977059] Stack: (0xdc969cf8 to 0xdc96a000)
[ 17.981692] 9ce0:
dc969d28 dc969d08
[ 17.990386] 9d00:
c038f9bc c038f6b4 ddb00480 dc969d34 dc969d28 c042bb74 c042bae4 00000000
[ 17.999080] 9d20:
c09562b0 c0954cc0 dc969d5c dc969d38 c043ebfc c042bb6c 00000007 00000003
[ 18.007773] 9d40:
ddb00000 ddb8cb58 ddb00480 00000001 dc969dec dc969d60 c0441614 c043ea68
[ 18.016465] 9d60:
00000000 00000003 00000000 fffffff4 dc969df4 0000000d 00000000 00000000
[ 18.025159] 9d80:
dc969db4 dc969d90 c005dc08 c05839e0 dc969df4 0000000d ddb00000 00001002
[ 18.033851] 9da0:
00000000 00000000 dc969dcc dc969db8 c005ddf4 c005dbc8 00000000 00000118
[ 18.042544] 9dc0:
dc969dec dc969dd0 ddb00000 c06db27c ffff9003 00001002 00000000 00000000
[ 18.051237] 9de0:
dc969e0c dc969df0 c057c88c c04410dc dc969e0c ddb00000 ddb00000 00000001
[ 18.059930] 9e00:
dc969e34 dc969e10 c057cb44 c057c7d8 ddb00000 ddb00138 00001002 beaeda20
[ 18.068622] 9e20:
00000000 00000000 dc969e5c dc969e38 c057cc28 c057cac0 00000000 dc969e80
[ 18.077315] 9e40:
dda7a40c beaeda20 00000000 00000000 dc969ecc dc969e60 c05e36d0 c057cc14
[ 18.086007] 9e60:
dc969e84 00000051 beaeda20 00000000 dda7a40c 00000014 ddb00000 00008914
[ 18.094699] 9e80:
30687465 00000000 00000000 00000000 00009003 00000000 00000000 00000000
[ 18.103391] 9ea0:
00001002 00008914 dd257ae0 beaeda20 c098a428 beaeda20 00000011 00000000
[ 18.112084] 9ec0:
dc969edc dc969ed0 c05e4e54 c05e3030 dc969efc dc969ee0 c055f5ac c05e4cc4
[ 18.120777] 9ee0:
beaeda20 dd257ae0 dc8ab4c0 00008914 dc969f7c dc969f00 c010b388 c055f45c
[ 18.129471] 9f00:
c071ca40 dd257ac0 c00165e8 dc968000 dc969f3c dc969f20 dc969f64 dc969f28
[ 18.138164] 9f20:
c0115708 c0683ec8 dd257ac0 dd257ac0 dc969f74 dc969f40 c055f350 c00fc66c
[ 18.146857] 9f40:
dd82e4d0 00000011 00000000 00080000 dd257ac0 00000000 dc8ab4c0 dc8ab4c0
[ 18.155550] 9f60:
00008914 beaeda20 00000011 00000000 dc969fa4 dc969f80 c010bc34 c010b2fc
[ 18.164242] 9f80:
00000000 00000011 00000002 00000036 c00165e8 dc968000 00000000 dc969fa8
[ 18.172935] 9fa0:
c00163e0 c010bbcc 00000000 00000011 00000011 00008914 beaeda20 00009003
[ 18.181628] 9fc0:
00000000 00000011 00000002 00000036 00081018 00000001 00000000 beaedc10
[ 18.190320] 9fe0:
00083188 beaeda1c 00043a5d b6d29c0c 600b0010 00000011 00000000 00000000
[ 18.198989] Backtrace:
[ 18.201621] [<
c042bad8>] (phy_attached_print) from [<
c042bb74>] (phy_attached_info+0x14/0x18)
[ 18.210664] r3:
c0954cc0 r2:
c09562b0 r1:
00000000
[ 18.215588] r4:
ddb00480
[ 18.218322] [<
c042bb60>] (phy_attached_info) from [<
c043ebfc>] (cpsw_slave_open+0x1a0/0x280)
[ 18.227293] [<
c043ea5c>] (cpsw_slave_open) from [<
c0441614>] (cpsw_ndo_open+0x544/0x674)
[ 18.235874] r7:
00000001 r6:
ddb00480 r5:
ddb8cb58 r4:
ddb00000
[ 18.241944] [<
c04410d0>] (cpsw_ndo_open) from [<
c057c88c>] (__dev_open+0xc0/0x128)
[ 18.249972] r9:
00000000 r8:
00000000 r7:
00001002 r6:
ffff9003 r5:
c06db27c r4:
ddb00000
[ 18.258255] [<
c057c7cc>] (__dev_open) from [<
c057cb44>] (__dev_change_flags+0x90/0x154)
[ 18.266745] r5:
00000001 r4:
ddb00000
[ 18.270575] [<
c057cab4>] (__dev_change_flags) from [<
c057cc28>] (dev_change_flags+0x20/0x50)
[ 18.279523] r9:
00000000 r8:
00000000 r7:
beaeda20 r6:
00001002 r5:
ddb00138 r4:
ddb00000
[ 18.287811] [<
c057cc08>] (dev_change_flags) from [<
c05e36d0>] (devinet_ioctl+0x6ac/0x76c)
[ 18.296483] r9:
00000000 r8:
00000000 r7:
beaeda20 r6:
dda7a40c r5:
dc969e80 r4:
00000000
[ 18.304762] [<
c05e3024>] (devinet_ioctl) from [<
c05e4e54>] (inet_ioctl+0x19c/0x1c8)
[ 18.312882] r10:
00000000 r9:
00000011 r8:
beaeda20 r7:
c098a428 r6:
beaeda20 r5:
dd257ae0
[ 18.321235] r4:
00008914
[ 18.323956] [<
c05e4cb8>] (inet_ioctl) from [<
c055f5ac>] (sock_ioctl+0x15c/0x2d8)
[ 18.331829] [<
c055f450>] (sock_ioctl) from [<
c010b388>] (do_vfs_ioctl+0x98/0x8d0)
[ 18.339765] r7:
00008914 r6:
dc8ab4c0 r5:
dd257ae0 r4:
beaeda20
[ 18.345822] [<
c010b2f0>] (do_vfs_ioctl) from [<
c010bc34>] (SyS_ioctl+0x74/0x84)
[ 18.353573] r10:
00000000 r9:
00000011 r8:
beaeda20 r7:
00008914 r6:
dc8ab4c0 r5:
dc8ab4c0
[ 18.361924] r4:
00000000
[ 18.364653] [<
c010bbc0>] (SyS_ioctl) from [<
c00163e0>] (ret_fast_syscall+0x0/0x3c)
[ 18.372682] r9:
dc968000 r8:
c00165e8 r7:
00000036 r6:
00000002 r5:
00000011 r4:
00000000
[ 18.380960] Code:
e92dd810 e24cb010 e24dd010 e59b4004 (
e5902180)
[ 18.387580] ---[ end trace
c80529466223f3f3 ]---
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Tue, 12 Apr 2016 16:27:29 +0000 (18:27 +0200)]
bgmac: fix MAC soft-reset bit for corerev > 4
Only core revisions older than 4 use BGMAC_CMDCFG_SR_REV0. This mainly
fixes support for BCM4708A0KF SoCs with Ethernet core rev 5 (it means
only some devices as most of BCM4708A0KF-s got core rev 4).
This was tested for regressions on BCM47094 which doesn't seem to care
which bit gets used.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2016 01:14:04 +0000 (21:14 -0400)]
Merge branch 'soreuseport-mixed-v4-v6-fixes'
Craig Gallek says:
====================
Fixes for SO_REUSEPORT and mixed v4/v6 sockets
Recent changes to the datastructures associated with SO_REUSEPORT broke
an existing behavior when equivalent SO_REUSEPORT sockets are created
using both AF_INET and AF_INET6. This patch series restores the previous
behavior and includes a test to validate it.
This series should be a trivial merge to stable kernels (if deemed
necessary), but will have conflicts in net-next. The following patches
recently replaced the use of hlist_nulls with hlists for UDP and TCP
socket lists:
ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
If this series is accepted, I will send an RFC for the net-next change
to assist with the merge.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek [Tue, 12 Apr 2016 17:11:26 +0000 (13:11 -0400)]
soreuseport: test mixed v4/v6 sockets
Test to validate the behavior of SO_REUSEPORT sockets that are
created with both AF_INET and AF_INET6. See the commit prior to this
for a description of this behavior.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Craig Gallek [Tue, 12 Apr 2016 17:11:25 +0000 (13:11 -0400)]
soreuseport: fix ordering for mixed v4/v6 sockets
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.
Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).
The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match. They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.
To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list. This will force AF_INET sockets to always be
considered first.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Tue, 12 Apr 2016 14:11:12 +0000 (16:11 +0200)]
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
We now have a positive report of another Huawei device needing
this quirk: The ME906s-158 (12d1:15c1). This is an m.2 form
factor modem with no obvious relationship to the E3372 (12d1:157d)
we already have a quirk entry for. This is reason enough to
believe the quirk might be necessary for any number of current
and future Huawei devices.
Applying the quirk to all Huawei devices, since it is crucial
to any device affected by the firmware bug, while the impact
on non-affected devices is negligible.
The quirk can if necessary be disabled per-device by writing
N to /sys/class/net/<iface>/cdc_ncm/ndp_to_end
Reported-by: Andreas Fett <andreas.fett@secunet.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Tue, 12 Apr 2016 11:30:45 +0000 (13:30 +0200)]
bgmac: reset & enable Ethernet core before using it
This fixes Ethernet on D-Link DIR-885L with BCM47094 SoC. Felix reported
similar fix was needed for his BCM4709 device (Buffalo WXR-1900DHP?).
I tested this for regressions on BCM4706, BCM4708A0 and BCM47081A0.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 20:29:54 +0000 (16:29 -0400)]
Merge branch 'ipv6-dgram-dst-cache'
Martin KaFai Lau says:
====================
ipv6: datagram: Update dst cache of a connected udp sk during pmtu update
v2:
~ Protect __sk_dst_get() operations with rcu_read_lock in
release_cb() because another thread may do ip6_dst_store()
for a udp sk without taking the sk lock (e.g. in sendmsg).
~ Do a ipv6_addr_v4mapped(&sk->sk_v6_daddr) check before
calling ip6_datagram_dst_update() in patch 3 and 4. It is
similar to how __ip6_datagram_connect handles it.
~ One fix in ip6_datagram_dst_update() in patch 2. It needs
to check (np->flow_label & IPV6_FLOWLABEL_MASK) before
doing fl6_sock_lookup. I was confused with the naming
of IPV6_FLOWLABEL_MASK and IPV6_FLOWINFO_MASK.
~ Check dst->obsolete just on the safe side, although I think it
should at least have DST_OBSOLETE_FORCE_CHK by now.
~ Add Fixes tag to patch 3 and 4
~ Add some points from the previous discussion about holding
sk lock to the commit message in patch 3.
v1:
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
Patch 1 and 2 are the prep work.
Patch 3 and 4 are the fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Mon, 11 Apr 2016 22:29:37 +0000 (15:29 -0700)]
ipv6: udp: Do a route lookup and update during release_cb
This patch adds a release_cb for UDPv6. It does a route lookup
and updates sk->sk_dst_cache if it is needed. It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.
It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Mon, 11 Apr 2016 22:29:36 +0000 (15:29 -0700)]
ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.
Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph). It is because a UDP socket can become
connected after sending out some datagrams in un-connected state. or
It can be connected multiple times to different destinations. Hence,
iph may not be related to where sk is currently connected to.
It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect() (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.
For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.
Test:
Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0 proto kernel metric 256 pref medium
2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium
A simple python code to create a connected UDP socket:
import socket
import errno
HOST = '2fac::1'
PORT = 8080
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
try:
data = s.recv(1024)
except socket.error as se:
if se.errno == errno.EMSGSIZE:
pmtu = s.getsockopt(41, 24)
print("PMTU:%d" % pmtu)
break
s.close()
Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300
Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0 metric 0
cache expires 463sec mtu 1300 pref medium
Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message. Here is the scapy script I have
used:
>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Wei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Mon, 11 Apr 2016 22:29:35 +0000 (15:29 -0700)]
ipv6: datagram: Refactor dst lookup and update codes to a new function
This patch moves the route lookup and update codes for connected
datagram sk to a newly created function ip6_datagram_dst_update()
It will be reused during the pmtu update in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Mon, 11 Apr 2016 22:29:34 +0000 (15:29 -0700)]
ipv6: datagram: Refactor flowi6 init codes to a new function
Move flowi6 init codes for connected datagram sk to a newly created
function ip6_datagram_flow_key_init().
Notes:
1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect
2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of
(addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init()
This new function will be reused during pmtu update in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Tue, 12 Apr 2016 06:35:18 +0000 (08:35 +0200)]
net: mediatek: update the IRQ part of the binding document
The current binding document only describes a single interrupt. Update the
document by adding the 2 other interrupts.
The driver currently only uses a single interrupt. The HW is however able
to using IRQ grouping to split TX and RX onto separate GIC irqs.
Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: devicetree@vger.kernel.org
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 16:00:59 +0000 (12:00 -0400)]
Merge tag 'mac80211-for-davem-2016-04-14' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This has just the single fix from Dmitry Ivanov, adding the missing
netlink notifier family check to avoid the socket close DoS problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Tue, 12 Apr 2016 17:26:19 +0000 (10:26 -0700)]
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
verifier must check for reserved size bits in instruction opcode and
reject BPF_LD | BPF_ABS | BPF_DW and BPF_LD | BPF_IND | BPF_DW instructions,
otherwise interpreter will WARN_RATELIMIT on them during execution.
Fixes: ddd872bc3098 ("bpf: verifier: add checks for BPF_ABS | BPF_IND instructions")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lars Persson [Tue, 12 Apr 2016 06:45:52 +0000 (08:45 +0200)]
net: sched: do not requeue a NULL skb
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.
We take the optimistic approach and continue running the queue after a
failure since it is unknown if later packets also will fail in the
validate path.
Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
Signed-off-by: Lars Persson <larper@axis.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Sun, 10 Apr 2016 10:52:28 +0000 (12:52 +0200)]
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.
Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 04:31:45 +0000 (00:31 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-04-13
This series contains updates to i40e, i40evf and fm10k.
Alex fixes a bug introduced earlier based on his interpretation of the
XL710 datasheet. The actual limit for fragments with TSO and a skbuff
that has payload data in the header portion of the buffer is actually
only 7 fragments and the skb-data portion counts as 2 buffers, one for
the TSO header, and the one for a segment payload buffer.
Jacob fixes a bug where in a previous refactor of the code broke
multi-bit updates for VFs. The problem occurs because a multi-bit
request has a non-zero length, and the PF would simply drop any
request with the upper 16 bits set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Friesen [Fri, 8 Apr 2016 21:21:30 +0000 (15:21 -0600)]
route: do not cache fib route info on local routes with oif
For local routes that require a particular output interface we do not want
to cache the result. Caching the result causes incorrect behaviour when
there are multiple source addresses on the interface. The end result
being that if the intended recipient is waiting on that interface for the
packet he won't receive it because it will be delivered on the loopback
interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
interface as well.
This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board. The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo). The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.
Sample dhcp_release command line:
dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed off-by: Chris Friesen <chris.friesen@windriver.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Thu, 31 Mar 2016 16:52:30 +0000 (09:52 -0700)]
fm10k: fix multi-bit VLAN update requests from VF
The VF uses a multi-bit update request to clear unused VLANs whenever it
resets. However, an accident in a previous refector broke multi-bit
updates for VFs, due to misreading a comment in fm10k_vf.c and
attempting to reduce code duplication. The problem occurs because
a multi-bit request has a non-zero length, and the PF would simply drop
any request with the upper 16 bits set.
We can't simply remove the check of the upper 16 bits and the call to
fm10k_iov_select vid, because this would remove the checks for default
VID and for ensuring no other VLANs can be enabled except pf_vid when it
has been set. To resolve that issue, this revision uses the
iov_select_vid when we have a single-bit update, and denies any
multi-bit update when the VLAN was administratively set by the PF. This
should be ok since the PF properly updates VLAN_TABLE when it assigns
the PF vid. This ensures that requests to add or remove the PF vid work
as expected, but a rogue VF could not use the multi-bit update as
a loophole to attempt receiving traffic on other VLANs.
Reported-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Daney [Fri, 8 Apr 2016 20:37:27 +0000 (13:37 -0700)]
net: thunderx: Fix broken of_node_put() code.
commit
b7d3e3d3d21a ("net: thunderx: Don't leak phy device references
on -EPROBE_DEFER condition.") incorrectly moved the call to
of_node_put() outside of the loop. Under normal loop exit, the node
has already had of_node_put() called, so the extra call results in:
[ 8.228020] ERROR: Bad of_node_put() on /soc@0/pci@
848000000000/mrml-bridge0@1,0/bgx0/xlaui00
[ 8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157
[ 8.247380] Hardware name: www.cavium.com
EBB8800/
EBB8800, BIOS 0.3 Mar 2 2016
[ 8.273541] Call trace:
[ 8.273550] [<
fffffc0008097364>] dump_backtrace+0x0/0x210
[ 8.273557] [<
fffffc0008097598>] show_stack+0x24/0x2c
[ 8.273560] [<
fffffc0008399ed0>] dump_stack+0x8c/0xb4
[ 8.273566] [<
fffffc00085aa828>] of_node_release+0xa8/0xac
[ 8.273570] [<
fffffc000839cad8>] kobject_cleanup+0x8c/0x194
[ 8.273573] [<
fffffc000839c97c>] kobject_put+0x44/0x6c
[ 8.273576] [<
fffffc00085a9ab0>] of_node_put+0x24/0x30
[ 8.273587] [<
fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx]
[ 8.273591] [<
fffffc00083ed220>] pci_device_probe+0xa0/0x114
[ 8.273596] [<
fffffc0008473fbc>] driver_probe_device+0x178/0x418
[ 8.273599] [<
fffffc000847435c>] __driver_attach+0x100/0x118
[ 8.273602] [<
fffffc0008471b58>] bus_for_each_dev+0x6c/0xac
[ 8.273605] [<
fffffc0008473884>] driver_attach+0x30/0x38
[ 8.273608] [<
fffffc00084732f4>] bus_add_driver+0x1f8/0x29c
[ 8.273611] [<
fffffc0008475028>] driver_register+0x70/0x110
[ 8.273617] [<
fffffc00083ebf08>] __pci_register_driver+0x60/0x6c
[ 8.273623] [<
fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx]
[ 8.273626] [<
fffffc0008090d04>] do_one_initcall+0xcc/0x1c0
[ 8.273631] [<
fffffc0008198abc>] do_init_module+0x68/0x1c8
[ 8.273635] [<
fffffc0008125668>] load_module+0xf44/0x11f4
[ 8.273638] [<
fffffc0008125b64>] SyS_finit_module+0xb8/0xe0
[ 8.273641] [<
fffffc0008093b30>] el0_svc_naked+0x24/0x28
Go back to the previous (correct) code that only did the extra
of_node_put() call on early exit from the loop.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emrah Demir [Fri, 8 Apr 2016 19:16:11 +0000 (22:16 +0300)]
mISDN: Fixing missing validation in base_sock_bind()
Add validation code into mISDN/socket.c
Signed-off-by: Emrah Demir <ed@abdsec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 30 Mar 2016 23:15:37 +0000 (16:15 -0700)]
i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet
This patch addresses a bug introduced based on my interpretation of the
XL710 datasheet. Specifically section 8.4.1 states that "A single transmit
packet may span up to 8 buffers (up to 8 data descriptors per packet
including both the header and payload buffers)." It then later goes on to
say that each segment for a TSO obeys the previous rule, however it then
refers to TSO header and the segment payload buffers.
I believe the actual limit for fragments with TSO and a skbuff that has
payload data in the header portion of the buffer is actually only 7
fragments as the skb->data portion counts as 2 buffers, one for the TSO
header, and one for a segment payload buffer.
Fixes: 2d37490b82af ("i40e/i40evf: Rewrite logic for 8 descriptor per packet check")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David Ahern [Fri, 8 Apr 2016 19:01:21 +0000 (12:01 -0700)]
net: ipv6: Do not keep linklocal and loopback addresses
f1705ec197e7 added the option to retain user configured addresses on an
admin down. A comment to one of the later revisions suggested using the
IFA_F_PERMANENT flag rather than adding a user_managed boolean to the
ifaddr struct. A side effect of this change is that link local and
loopback addresses are also retained which is not part of the objective
of
f1705ec197e7. Add check to drop those addresses.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wolfram Sang [Fri, 8 Apr 2016 11:28:42 +0000 (13:28 +0200)]
net: ethernet: renesas: ravb_main: test clock rate to avoid division by 0
The clk API may return 0 on clk_get_rate, so we should check the result before
using it as a divisor.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 01:49:03 +0000 (21:49 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree. More
specifically, they are:
1) Fix missing filter table per-netns registration in arptables, from
Florian Westphal.
2) Resolve out of bound access when parsing TCP options in
nf_conntrack_tcp, patch from Jozsef Kadlecsik.
3) Prefer NFPROTO_BRIDGE extensions over NFPROTO_UNSPEC in ebtables,
this resolves conflict between xt_limit and ebt_limit, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Apr 2016 21:55:39 +0000 (17:55 -0400)]
Merge tag 'wireless-drivers-for-davem-2016-04-13' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.6
b43
* fix memory leaks when removing the device
bcma
* fix building without OF_IRQ
rtlwifi
* fix gcc-6 indentation warning
iwlwifi
* lower the debug level of a benign print
* fix a memory leak
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Mon, 11 Apr 2016 23:31:14 +0000 (01:31 +0200)]
netfilter: ebtables: Fix extension lookup with identical name
If a requested extension exists as module and is not loaded,
ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same
name and fail.
Reproduced with limit match: Given xt_limit and ebt_limit both built as
module, the following would fail:
modprobe xt_limit
ebtables -I INPUT --limit 1/s -j ACCEPT
The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC
extension and retry after requesting an appropriate module.
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dmitry Ivanov [Wed, 6 Apr 2016 14:23:18 +0000 (17:23 +0300)]
nl80211: check netlink protocol in socket release notification
A non-privileged user can create a netlink socket with the same port_id as
used by an existing open nl80211 netlink socket (e.g. as used by a hostapd
process) with a different protocol number.
Closing this socket will then lead to the notification going to nl80211's
socket release notification handler, and possibly cause an action such as
removing a virtual interface.
Fix this issue by checking that the netlink protocol is NETLINK_GENERIC.
Since generic netlink has no notifier chain of its own, we can't fix the
problem more generically.
Fixes: 026331c4d9b5 ("cfg80211/mac80211: allow registering for and sending action frames")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
stephen hemminger [Mon, 11 Apr 2016 20:39:08 +0000 (13:39 -0700)]
devlink: add missing install of header
The new devlink.h in uapi was not being installed by
make headers_install
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 7 Apr 2016 18:10:41 +0000 (11:10 -0700)]
net: vrf: Fix dev refcnt leak due to IPv6 prefix route
ifupdown2 found a kernel bug with IPv6 routes and movement from the main
table to the VRF table. Sequence of events:
Create the interface and add addresses:
ip link add dev eth4.105 link eth4 type vlan id 105
ip addr add dev eth4.105 8.105.105.10/24
ip -6 addr add dev eth4.105 2008:105:105::10/64
At this point IPv6 has inserted a prefix route in the main table even
though the interface is 'down'. From there the VRF device is created:
ip link add dev vrf105 type vrf table 105
ip addr add dev vrf105 9.9.105.10/32
ip -6 addr add dev vrf105 2000:9:105::10/128
ip link set vrf105 up
Then the interface is enslaved, while still in the 'down' state:
ip link set dev eth4.105 master vrf105
Since the device is down the VRF driver cycling the device does not
send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event
which does not flush the routes inserted prior.
When the link is brought up
ip link set dev eth4.105 up
the prefix route is added in the VRF table, but does not remove
the route from the main table.
Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented
for IPv4 in
7f49e7a38b77 ("net: Flush local routes when device changes vrf
association")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 7 Apr 2016 18:10:06 +0000 (11:10 -0700)]
net: vrf: Fix dst reference counting
Vivek reported a kernel exception deleting a VRF with an active
connection through it. The root cause is that the socket has a cached
reference to a dst that is destroyed. Converting the dst_destroy to
dst_release and letting proper reference counting kick in does not
work as the dst has a reference to the device which needs to be released
as well.
I talked to Hannes about this at netdev and he pointed out the ipv4 and
ipv6 dst handling has dst_ifdown for just this scenario. Rather than
continuing with the reinvented dst wheel in VRF just remove it and
leverage the ipv4 and ipv6 versions.
Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Apr 2016 19:22:21 +0000 (15:22 -0400)]
Merge branch 'tipc-fixes'
Jon Maloy says:
====================
tipc: name distributor pernet queue
Commit #1 fixes a potential issue with deferred binding table
updates being pushed to the wrong namespace.
Commit #2 solves a problem with deferred binding table updates
remaining in the the defer queue after the issuing node has gone
down.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 7 Apr 2016 14:40:44 +0000 (10:40 -0400)]
tipc: purge deferred updates from dead nodes
If a peer node becomes unavailable, in addition to removing the
nametable entries from this node we also need to purge all deferred
updates associated with this node.
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 7 Apr 2016 14:40:43 +0000 (10:40 -0400)]
tipc: make dist queue pernet
Nametable updates received from the network that cannot be applied
immediately are placed on a defer queue. This queue is global to the
TIPC module, which might cause problems when using TIPC in containers.
To prevent nametable updates from escaping into the wrong namespace,
we make the queue pernet instead.
Signed-off-by: Erik Hugne <erik.hugne@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 11 Apr 2016 05:37:58 +0000 (11:07 +0530)]
cxgb4: Stop Rx Queues before freeing it up
Stop all Ethernet RX Queues before freeing up various Ingress/Egress
Queues, etc. We were seeing cases of Ingress Queues not getting serviced
during the shutdown process leading to Ingress Paths jamming up through
the chip and blocking the shutdown effort itself.
One such case involved the Firmware sending a "Flush Token" through the
ULP-TX -> ULP-RX path for an Ethernet TX Queue being freed in order to
make sure there weren't any remaining TX Work Requests in the pipeline.
But the return path was stalled by Ingress Data unable to be delivered to
the Host because those Ingress Queues were no longer being serviced.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Reid [Thu, 7 Apr 2016 07:55:35 +0000 (15:55 +0800)]
net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP
When using the PTP fpga to hps clock source for the stmmac module
the appropriate bit in the System Manager FPGA Interface Group register
needs to be set. This is not set by the bootloader setup when the
HPS emac pins are being for this emac module.
This allows the PTP clock to be sourced from the FPGA and also connects
the PTP pps and ext trig signals to the stmmac PTP hardware.
Patch proposed by Phil Collins.
Signed-off-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Ivanov [Thu, 7 Apr 2016 07:31:38 +0000 (09:31 +0200)]
netlink: don't send NETLINK_URELEASE for unbound sockets
All existing users of NETLINK_URELEASE use it to clean up resources that
were previously allocated to a socket via some command. As a result, no
users require getting this notification for unbound sockets.
Sending it for unbound sockets, however, is a problem because any user
(including unprivileged users) can create a socket that uses the same ID
as an existing socket. Binding this new socket will fail, but if the
NETLINK_URELEASE notification is generated for such sockets, the users
thereof will be tricked into thinking the socket that they allocated the
resources for is closed.
In the nl80211 case, this will cause destruction of virtual interfaces
that still belong to an existing hostapd process; this is the case that
Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
of netlink log/queue it will cause them to stop reporting events, as if
NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.
Fix this problem by checking that the socket is bound before generating
the NETLINK_URELEASE notification.
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Apr 2016 03:01:30 +0000 (23:01 -0400)]
decnet: Do not build routes to devices without decnet private data.
In particular, make sure we check for decnet private presence
for loopback devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Wed, 6 Apr 2016 18:15:19 +0000 (15:15 -0300)]
sctp: avoid refreshing heartbeat timer too often
Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.
As suggested by David Laight this patch now removes this timer update
from hot path by leaving the timer on and re-evaluating upon its
expiration if the heartbeat is still needed or not, similarly to what is
done for TCP. If it's not needed anymore the timer is re-scheduled to
the new timeout, considering the time already elapsed.
For this, we now record the last tx timestamp per transport, updated in
the same spots as hb timer was restarted on tx. Also split up
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
heartbeat one.
On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:
Samples: 103K of event 'cpu-clock', Event count (approx.):
25833000000
Overhead Command Shared Object Symbol
+ 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
- _raw_write_unlock_irqrestore
- 96,54% _raw_spin_unlock_irqrestore
- 36,14% mod_timer
+ 97,24% sctp_transport_reset_timers
+ 2,76% sctp_do_sm
+ 33,65% __wake_up_sync_key
+ 28,77% sctp_ulpq_tail_event
+ 1,40% del_timer
- 1,84% mod_timer
+ 99,03% sctp_transport_reset_timers
+ 0,97% sctp_do_sm
+ 1,50% sctp_ulpq_tail_event
And after this patch, now with netperf -l 60:
Samples: 230K of event 'cpu-clock', Event count (approx.):
57707250000
Overhead Command Shared Object Symbol
+ 5,65% netperf [kernel.vmlinux] [k] memcpy_erms
+ 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
- _raw_spin_unlock_irqrestore
+ 49,89% __wake_up_sync_key
+ 45,68% sctp_ulpq_tail_event
- 2,85% mod_timer
+ 76,51% sctp_transport_reset_t3_rtx
+ 23,49% sctp_do_sm
+ 1,55% del_timer
+ 2,50% netperf [sctp] [k] sctp_datamsg_from_user
+ 2,26% netperf [sctp] [k] sctp_sendmsg
Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
~3.7%.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 9 Apr 2016 23:28:41 +0000 (02:28 +0300)]
sh_eth: re-enable-E-MAC interrupts in sh_eth_set_ringparam()
The E-MAC interrupts are left disabled when the ring parameters are changed
via 'ethtool'. In order to fix this, it's enough to call sh_eth_dev_init()
with 'true' instead of 'false' for the second argument (which conveniently
allows us to remove the following code re-enabling E-DMAC interrupts and
reception).
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 9 Apr 2016 19:32:44 +0000 (12:32 -0700)]
Merge tag 'tty-4.6-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty fixes for issues found.
One was due to a merge error in 4.6-rc1, and the other a regression
fix for UML consoles that broke in 4.6-rc1.
Both have been in linux-next for a while"
* tag 'tty-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix merge of "tty: Refactor tty_open()"
tty: Fix UML console breakage
Linus Torvalds [Sat, 9 Apr 2016 19:23:02 +0000 (12:23 -0700)]
Merge tag 'usb-4.6-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 4.6-rc3.
Nothing major, the normal USB gadget fixes and usb-serial driver ids,
along with some other fixes mixed in. All except the USB serial ids
have been tested in linux-next, the id additions should be fine as
they are 'trivial'"
* tag 'usb-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
USB: option: add "D-Link DWM-221 B1" device id
USB: serial: cp210x: Adding GE Healthcare Device ID
USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices
usb: dwc3: keystone: drop dma_mask configuration
usb: gadget: udc-core: remove manual dma configuration
usb: dwc3: pci: add ID for one more Intel Broxton platform
usb: renesas_usbhs: fix to avoid using a disabled ep in usbhsg_queue_done()
usb: dwc2: do not override forced dr_mode in gadget setup
usb: gadget: f_midi: unlock on error
USB: digi_acceleport: do sanity checking for the number of ports
USB: cypress_m8: add endpoint sanity check
USB: mct_u232: add sanity checking in probe
usb: fix regression in SuperSpeed endpoint descriptor parsing
USB: usbip: fix potential out-of-bounds write
usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer
usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler()
usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize
usb: phy: qcom-8x16: fix regulator API abuse
usb: ch9: Fix SSP Device Cap wFunctionalitySupport type
usb: gadget: composite: Access SSP Dev Cap fields properly
...
Linus Torvalds [Sat, 9 Apr 2016 19:09:37 +0000 (12:09 -0700)]
Merge tag 'staging-4.6-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here are some IIO driver fixes, along with two staging driver fixes
for 4.6-rc3.
One staging driver patch reverts the deletion of a driver that
happened in 4.6-rc1. We thought that laptop.org was dead, but it's
still alive and kicking, and has users that were mad we broke their
hardware by deleting a driver for their machines. So that driver is
added back and everyone is happy again.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert "Staging: olpc_dcon: Remove obsolete driver"
staging/rdma/hfi1: select CRC32
iio: gyro: bmg160: fix buffer read values
iio: gyro: bmg160: fix endianness when reading axes
iio: accel: bmc150: fix endianness when reading axes
iio: st_magn: always define ST_MAGN_TRIGGER_SET_STATE
iio: fix config watermark initial value
iio: health: max30100: correct FIFO check condition
iio: imu: Fix inv_mpu6050 dependencies
iio: adc: Fix build error of missing devm_ioremap_resource on UM
iio: light: apds9960: correct FIFO check condition
iio: adc: max1363: correct reference voltage
iio: adc: max1363: add missing adc to max1363_id
Linus Torvalds [Sat, 9 Apr 2016 19:00:42 +0000 (12:00 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of eight fixes.
Two are trivial gcc-6 updates (brace additions and unused variable
removal). There's a couple of cxlflash regressions, a correction for
sd being overly chatty on revalidation (causing excess log increases).
A VPD issue which could crash USB devices because they seem very
intolerant to VPD inquiries, an ALUA deadlock fix and a mpt3sas buffer
overrun fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Do not attach VPD to devices that don't support it
sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
scsi_dh_alua: Fix a recently introduced deadlock
scsi: Declare local symbols static
cxlflash: Move to exponential back-off when cmd_room is not available
cxlflash: Fix regression issue with re-ordering patch
mpt3sas: Don't overreach ioc->reply_post[] during initialization
aacraid: add missing curly braces
Linus Torvalds [Sat, 9 Apr 2016 18:23:27 +0000 (11:23 -0700)]
Merge tag 'md/4.6-rc2-fix' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This update mainly fixes bugs:
- fix error handling (Guoqing)
- fix a crash when a disk is hotremoved (me)
- fix a dead loop (Wei Fang)"
* tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: clear bitmap if bitmap_create failed
MD: add rdev reference for super write
md: fix a trivial typo in comments
md:raid1: fix a dead loop when read from a WriteMostly disk
Linus Torvalds [Sat, 9 Apr 2016 18:03:48 +0000 (11:03 -0700)]
Merge tag 'pm+acpi-4.6-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).
Specifics:
- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"
* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...
Linus Torvalds [Sat, 9 Apr 2016 17:50:44 +0000 (10:50 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
from Haishuang Yan.
2) Fix multicast frame handling in mac80211 AP code, from Felix
Fietkau.
3) mac80211 station hashtable insert errors not handled properly, fix
from Johannes Berg.
4) Fix TX descriptor count limit handling in e1000, from Alexander
Duyck.
5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.
6) Must assign rtnl_link_ops of the device before registering it, fix
in ip6_tunnel from Thadeu Lima de Souza Cascardo.
7) Memory leak fix in tc action net exit, from WANG Cong.
8) Add missing AF_KCM entries to name tables, from Dexuan Cui.
9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
Duyck.
10) Fix memory allocation alignment and congestion map corruption in
RDS, from Shamir Rabinovitch.
11) Fix default qdisc regression in tuntap driver, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge, netem: mark mailing lists as moderated
tuntap: restore default qdisc
mpls: find_outdev: check for err ptr in addition to NULL check
ipv6: Count in extension headers in skb->network_header
RDS: fix congestion map corruption for PAGE_SIZE > 4k
RDS: memory allocated must be align to 8
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
net: add the AF_KCM entries to family name tables
MAINTAINERS: intel-wired-lan list is moderated
lib/test_bpf: Add additional BPF_ADD tests
lib/test_bpf: Add test to check for result of 32-bit add that overflows
lib/test_bpf: Add tests for unsigned BPF_JGT
lib/test_bpf: Fix JMP_JSET tests
VSOCK: Detach QP check should filter out non matching QPs.
stmmac: fix adjust link call in case of a switch is attached
af_packet: tone down the Tx-ring unsupported spew.
net_sched: fix a memory leak in tc action
samples/bpf: Enable powerpc support
samples/bpf: Use llc in PATH, rather than a hardcoded value
samples/bpf: Fix build breakage with map_perf_test_user.c
...
Linus Torvalds [Sat, 9 Apr 2016 17:41:34 +0000 (10:41 -0700)]
Merge branch 'for-linus-4.6' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are bug fixes, including a really old fsync bug, and a few trace
points to help us track down problems in the quota code"
* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file/data loss caused by fsync after rename and new inode
btrfs: Reset IO error counters before start of device replacing
btrfs: Add qgroup tracing
Btrfs: don't use src fd for printk
btrfs: fallback to vmalloc in btrfs_compare_tree
btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
btrfs: Output more info for enospc_debug mount option
Btrfs: fix invalid reference in replace_path
Btrfs: Improve FL_KEEP_SIZE handling in fallocate
Linus Torvalds [Sat, 9 Apr 2016 17:33:58 +0000 (10:33 -0700)]
Merge tag 'for-linus-4.6-ofs1' of git://git./linux/kernel/git/hubcap/linux
Pull orangefs fixes from Mike Marshall:
"Orangefs cleanups and a strncpy vulnerability fix.
Cleanups:
- remove an unused variable from orangefs_readdir.
- clean up printk wrapper used for ofs "gossip" debugging.
- clean up truncate ctime and mtime setting in inode.c
- remove a useless null check found by coccinelle.
- optimize some memcpy/memset boilerplate code.
- remove some useless sanity checks from xattr.c
Fix:
- fix a potential strncpy vulnerability"
* tag 'for-linus-4.6-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: remove unused variable
orangefs: Add KERN_<LEVEL> to gossip_<level> macros
orangefs: strncpy -> strscpy
orangefs: clean up truncate ctime and mtime setting
Orangefs: fix ifnullfree.cocci warnings
Orangefs: optimize boilerplate code.
Orangefs: xattr.c cleanup
Linus Torvalds [Sat, 9 Apr 2016 17:23:45 +0000 (10:23 -0700)]
Merge tag 'iommu-fixes-v4.6-rc2' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- compile-time fixes (warnings and failures)
- a bug in iommu core code which could cause the group->domain pointer
to be falsly cleared
- fix in scatterlist handling of the ARM common DMA-API code
- stall detection fix for the Rockchip IOMMU driver
* tag 'iommu-fixes-v4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Silence an uninitialized variable warning
iommu/rockchip: Fix "is stall active" check
iommu: Don't overwrite domain pointer when there is no default_domain
iommu/dma: Restore scatterlist offsets correctly
iommu: provide of_xlate pointer unconditionally
Greg Kroah-Hartman [Fri, 8 Apr 2016 22:41:58 +0000 (15:41 -0700)]
Merge tag 'usb-serial-4.6-rc3' of git://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.6-rc3
Here are some new device ids.
Signed-off-by: Johan Hovold <johan@kernel.org>
David S. Miller [Fri, 8 Apr 2016 20:41:28 +0000 (16:41 -0400)]
Merge tag 'mac80211-for-davem-2016-04-06' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
For the current RC series, we have the following fixes:
* TDLS fixes from Arik and Ilan
* rhashtable fixes from Ben and myself
* documentation fixes from Luis
* U-APSD fixes from Emmanuel
* a TXQ fix from Felix
* and a compiler warning suppression from Jeff
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 5 Apr 2016 20:43:53 +0000 (13:43 -0700)]
bridge, netem: mark mailing lists as moderated
I moderate these (lightly loaded) lists to block spam.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 8 Apr 2016 05:26:48 +0000 (13:26 +0800)]
tuntap: restore default qdisc
After commit
f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using
alloc_netdev"), default qdisc was changed to noqueue because
tuntap does not set tx_queue_len during .setup(). This patch restores
default qdisc by setting tx_queue_len in tun_setup().
Fixes: f84bb1eac027 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Brandenburg [Mon, 4 Apr 2016 20:26:38 +0000 (16:26 -0400)]
orangefs: remove unused variable
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Rafael J. Wysocki [Fri, 8 Apr 2016 19:46:56 +0000 (21:46 +0200)]
Merge branches 'pm-core', 'powercap' and 'pm-tools'
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
Rafael J. Wysocki [Fri, 8 Apr 2016 19:46:05 +0000 (21:46 +0200)]
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc'
* pm-cpufreq:
cpufreq: dt: Drop stale comment
cpufreq: intel_pstate: Documenation for structures
cpufreq: intel_pstate: fix inconsistency in setting policy limits
intel_pstate: Avoid extra invocation of intel_pstate_sample()
intel_pstate: Do not set utilization update hook too early
* pm-cpuidle:
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
* acpi-cppc:
mailbox: pcc: Don't access an unmapped memory address space
Joe Perches [Sun, 27 Mar 2016 21:34:52 +0000 (14:34 -0700)]
orangefs: Add KERN_<LEVEL> to gossip_<level> macros
Emit the logging messages at the appropriate levels.
Miscellanea:
o Change format to fmt
o Use the more common ##__VA_ARGS__
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Martin Brandenburg [Fri, 8 Apr 2016 17:33:21 +0000 (13:33 -0400)]
orangefs: strncpy -> strscpy
It would have been possible for a rogue client-core to send in a symlink
target which is not NUL terminated. This returns EIO if the client-core
gives us corrupt data.
Leave debugfs and superblock code as is for now.
Other dcache.c and namei.c strncpy instances are safe because
ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
name plus a NUL byte.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Martin Brandenburg [Mon, 4 Apr 2016 20:26:36 +0000 (16:26 -0400)]
orangefs: clean up truncate ctime and mtime setting
The ctime and mtime are always updated on a successful ftruncate and
only updated on a successful truncate where the size changed.
We handle the ``if the size changed'' bit.
This matches FUSE's behavior.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
kbuild test robot [Sat, 26 Mar 2016 18:54:23 +0000 (02:54 +0800)]
Orangefs: fix ifnullfree.cocci warnings
fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
NULL check before some freeing functions is not needed.
Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.
Generated by: scripts/coccinelle/free/ifnullfree.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Wed, 6 Apr 2016 15:19:37 +0000 (11:19 -0400)]
Orangefs: optimize boilerplate code.
Suggested by David Binderman <dcb314@hotmail.com>
The former can potentially be a performance win over the latter.
memcpy(d, s, len);
memset(d+len, c, size-len);
memset(d, c, size);
memcpy(d, s, len);
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Mike Marshall [Wed, 6 Apr 2016 14:52:38 +0000 (10:52 -0400)]
Orangefs: xattr.c cleanup
1. It is nonsense to test for negative size_t, suggested by
David Binderman <dcb314@hotmail.com>
2. By the time Orangefs gets called, the vfs has ensured that
name != NULL, and that buffer and size are sane.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Roopa Prabhu [Fri, 8 Apr 2016 04:28:38 +0000 (21:28 -0700)]
mpls: find_outdev: check for err ptr in addition to NULL check
find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
find the output device. In case of an error, inet{,6}_fib_lookup_dev()
returns error pointer and dev_get_by_index() returns NULL. But the function
only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
This patch adds an additional check for err ptr after the NULL check.
Before: Trying to add an mpls route with no oif from user, no available
path to 10.1.1.8 and no default route:
$ip -f mpls route add 100 as 200 via inet 10.1.1.8
[ 822.337195] BUG: unable to handle kernel NULL pointer dereference at
00000000000003a3
[ 822.340033] IP: [<
ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] PGD
1db38067 PUD
1de9e067 PMD 0
[ 822.340033] Oops: 0000 [#1] SMP
[ 822.340033] Modules linked in:
[ 822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54
[ 822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
04/01/2014
[ 822.340033] task:
ffff88001db82580 ti:
ffff88001dad4000 task.ti:
ffff88001dad4000
[ 822.340033] RIP: 0010:[<
ffffffff8148781e>] [<
ffffffff8148781e>]
mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] RSP: 0018:
ffff88001dad7a88 EFLAGS:
00010282
[ 822.340033] RAX:
ffffffffffffff9b RBX:
ffffffffffffff9b RCX:
0000000000000002
[ 822.340033] RDX:
00000000ffffff9b RSI:
0000000000000008 RDI:
0000000000000000
[ 822.340033] RBP:
ffff88001ddc9ea0 R08:
ffff88001e9f1768 R09:
0000000000000000
[ 822.340033] R10:
ffff88001d9c1100 R11:
ffff88001e3c89f0 R12:
ffffffff8187e0c0
[ 822.340033] R13:
ffffffff8187e0c0 R14:
ffff88001ddc9e80 R15:
0000000000000004
[ 822.340033] FS:
00007ff9ed798700(0000) GS:
ffff88001fc00000(0000)
knlGS:
0000000000000000
[ 822.340033] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 822.340033] CR2:
00000000000003a3 CR3:
000000001de89000 CR4:
00000000000006f0
[ 822.340033] Stack:
[ 822.340033]
0000000000000000 0000000100000000 0000000000000000
0000000000000000
[ 822.340033]
0000000000000000 0801010a00000000 0000000000000000
0000000000000000
[ 822.340033]
0000000000000004 ffffffff8148749b ffffffff8187e0c0
000000000000001c
[ 822.340033] Call Trace:
[ 822.340033] [<
ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e
[ 822.340033] [<
ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2
[ 822.340033] [<
ffffffff810e7bbc>] ? get_page+0x5/0xa
[ 822.340033] [<
ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191
[ 822.340033] [<
ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e
[ 822.340033] [<
ffffffff813c9393>] ?
rht_key_hashfn.isra.20.constprop.57+0x14/0x1f
[ 822.340033] [<
ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc
[ 822.340033] [<
ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82
[ 822.340033] [<
ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28
[ 822.340033] [<
ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189
[ 822.340033] [<
ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8
[ 822.340033] [<
ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b
[ 822.340033] [<
ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3
[ 822.340033] [<
ffffffff810e4f35>] ?
__alloc_pages_nodemask+0x11c/0x1e4
[ 822.340033] [<
ffffffff8110619c>] ? PageAnon+0x5/0xd
[ 822.340033] [<
ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52
[ 822.340033] [<
ffffffff810e7bbc>] ? get_page+0x5/0xa
[ 822.340033] [<
ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a
[ 822.340033] [<
ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30
[ 822.340033] [<
ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a
[ 822.340033] [<
ffffffff8148f597>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a
[ 822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff
eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db
74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07
[ 822.340033] RIP [<
ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
[ 822.340033] RSP <
ffff88001dad7a88>
[ 822.340033] CR2:
00000000000003a3
[ 822.435363] ---[ end trace
98cc65e6f6b8bf11 ]---
After patch:
$ip -f mpls route add 100 as 200 via inet 10.1.1.8
RTNETLINK answers: Network is unreachable
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Sitnicki [Tue, 5 Apr 2016 16:41:08 +0000 (18:41 +0200)]
ipv6: Count in extension headers in skb->network_header
When sending a UDPv6 message longer than MTU, account for the length
of fragmentable IPv6 extension headers in skb->network_header offset.
Same as we do in alloc_new_skb path in __ip6_append_data().
This ensures that later on __ip6_make_skb() will make space in
headroom for fragmentable extension headers:
/* move skb->data to ip header from ext header */
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
Prevents a splat due to skb_under_panic:
skbuff: skb_under_panic: text:
ffffffff8143397b len:2126 put:14 \
head:
ffff880005bacf50 data:
ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] KASAN
CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
[...]
Call Trace:
[<
ffffffff813eb7b9>] skb_push+0x79/0x80
[<
ffffffff8143397b>] eth_header+0x2b/0x100
[<
ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
[<
ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
[<
ffffffff814efe3a>] ip6_output+0x16a/0x280
[<
ffffffff815440c1>] ip6_local_out+0xb1/0xf0
[<
ffffffff814f1115>] ip6_send_skb+0x45/0xd0
[<
ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
[<
ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
[...]
Reported-by: Ji Jianwen <jiji@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bart Van Assche [Thu, 7 Apr 2016 22:55:04 +0000 (15:55 -0700)]
Revert "ib_srpt: Convert to percpu_ida tag allocation"
This reverts commit
0fd10721fe3664f7549e74af9d28a509c9a68719.
That patch causes the ib_srpt driver to crash as soon as the first SCSI
command is received:
kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
invalid opcode: 0000 [#1] SMP
Workqueue: target_completion target_complete_ok_work [target_core_mod]
RIP: srpt_queue_response+0x437/0x4a0 [ib_srpt]
Call Trace:
srpt_queue_data_in+0x9/0x10 [ib_srpt]
target_complete_ok_work+0x152/0x2b0 [target_core_mod]
process_one_work+0x197/0x480
worker_thread+0x49/0x490
kthread+0xea/0x100
ret_from_fork+0x22/0x40
Aside from the crash, the shortcomings of that patch are as follows:
- It makes the ib_srpt driver use I/O contexts allocated by
transport_alloc_session_tags() but it does not initialize these I/O
contexts properly. All the initializations performed by
srpt_alloc_ioctx() are skipped.
- It swaps the order of the send ioctx allocation and the transition to
RTR mode which is wrong.
- The amount of memory that is needed for I/O contexts is doubled.
- srpt_rdma_ch.free_list is no longer used but is not removed.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 8 Apr 2016 00:22:20 +0000 (17:22 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.
This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: ignore quota mount options if the quota feature is enabled
ext4 crypto: fix some error handling
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: retry block allocation for failed DIO and DAX writes
ext4: add lockdep annotations for i_data_sem
ext4: allow readdir()'s of large empty directories to be interrupted
btrfs: fix crash/invalid memory access on fsync when using overlayfs
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
nfs: use file_dentry()
fs: add file_dentry()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
Linus Torvalds [Thu, 7 Apr 2016 23:34:26 +0000 (16:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This just fixes a few remaining memory allocations in RBD to use
GFP_NOIO instead of GFP_ATOMIC"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use GFP_NOIO consistently for request allocations
Linus Torvalds [Thu, 7 Apr 2016 23:20:40 +0000 (16:20 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio/qemu fixes from Michael S Tsirkin:
"A couple of fixes for virtio and for the new QEMU fw cfg driver"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: add VIRTIO_CONFIG_S_NEEDS_RESET device status bit
MAINTAINERS: add entry for QEMU
firmware: qemu_fw_cfg.c: hold ACPI global lock during device access
virtio: virtio 1.0 cs04 spec compliance for reset
qemu_fw_cfg: don't leak kobj on init error
Linus Torvalds [Thu, 7 Apr 2016 22:53:19 +0000 (15:53 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is mostly amdgpu/radeon fixes, and imx related fixes.
There is also one one TTM fix, one nouveau fix, and one hdlcd fix.
The AMD ones are some fixes for power management after suspend/resume
one some GPUs, and some vblank fixes.
The IMX ones are for more stricter plane checks and some cleanups.
I'm off until Monday, so therre might be some fixes early next week if
anyone missed me"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (34 commits)
drm/nouveau/tegra: acquire and enable reference clock if needed
drm/amdgpu: total vram size also reduces pin size
drm/amd/powerplay: add uvd/vce dpm enabling flag default.
drm/amd/powerplay: fix issue that resume back, dpm can't work on FIJI.
drm/amdgpu: save and restore the firwmware cache part when suspend resume
drm/amdgpu: save and restore UVD context with suspend and resume
drm/ttm: use phys_addr_t for ttm_bus_placement
drm: ARM HDLCD - fix an error code
drm: ARM HDLCD - get rid of devm_clk_put()
drm/radeon: Only call drm_vblank_on/off between drm_vblank_init/cleanup
drm/amdgpu: fence wait old rcu slot
drm/amdgpu: fix leaking fence in the pageflip code
drm/amdgpu: print vram type rather than just DDR
drm/amdgpu/gmc: use proper register for vram type on Fiji
drm/amdgpu/gmc: move vram type fetching into sw_init
drm/amdgpu: Set vblank_disable_allowed = true
drm/radeon: Set vblank_disable_allowed = true
drm/amd/powerplay: Need to change boot to performance state in resume.
drm/amd/powerplay: add new Fiji function for not setting same ps.
drm/amdgpu: check dpm state before pm system fs initialized.
...
shamir rabinovitch [Thu, 7 Apr 2016 11:57:36 +0000 (07:57 -0400)]
RDS: fix congestion map corruption for PAGE_SIZE > 4k
When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If
'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it
then read the data fragment as far congestion map update and lead to
corruption of the RDS connection far congestion map.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
shamir rabinovitch [Thu, 7 Apr 2016 11:57:35 +0000 (07:57 -0400)]
RDS: memory allocated must be align to 8
Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory
allocated by 'rds_page_remainder_alloc' using uint64_t pointer.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 5 Apr 2016 16:13:39 +0000 (09:13 -0700)]
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE. Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.
The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header. As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.
Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Strashko, Grygorii [Wed, 6 Apr 2016 11:45:53 +0000 (14:45 +0300)]
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
Now wakeirq stops working for device if wakeup option for
this device will be reconfigured through sysfs, like:
echo disabled > /sys/devices/platform/extcon_usb1/power/wakeup
echo enabled > /sys/devices/platform/extcon_usb1/power/wakeup
Once above set of commands is executed the device's wakeup_source
opject will be recreated and dev->power.wakeup->wakeirq field will
contain NULL. As result, device_wakeup_arm_wake_irqs() will not arm
wakeirq for the affected device.
Hece, lets try to fix it in the following way:
check for dev->wakeirq field when device_wakeup_attach() is called
and if !NULL re-attach wakeirq to the device
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:16:00 +0000 (17:16 -0400)]
tools/power turbostat: work around RC6 counter wrap
Sometimes the rc6 sysfs counter spontaneously resets,
causing turbostat prints a very large number
as it tries to calcuate % = 100 * (old - new) / interval
When we see (old > new), print ***.**% instead
of a bogus huge number.
Note that this detection is not fool-proof, as the counter
could reset several times and still result in new > old.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:59 +0000 (17:15 -0400)]
tools/power turbostat: initial KBL support
KBL is similar to SKL
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:58 +0000 (17:15 -0400)]
tools/power turbostat: initial SKX support
SKX has a lot in common with HSX
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:57 +0000 (17:15 -0400)]
tools/power turbostat: decode BXT TSC frequency via CPUID
Hard-code BXT ART to 19200MHz, so turbostat --debug
can fully enumerate TSC:
CPUID(0x15): eax_crystal: 3 ebx_tsc: 186 ecx_crystal_hz: 0
TSC: 1190 MHz (
19200000 Hz * 186 / 3 /
1000000)
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:56 +0000 (17:15 -0400)]
tools/power turbostat: initial BXT support
Broxton has a lot in common with SKL
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:55 +0000 (17:15 -0400)]
tools/power turbostat: print IRTL MSRs
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:15:54 +0000 (17:15 -0400)]
tools/power turbostat: SGX state should print only if --debug
The CPUID.SGX bit was printed, even if --debug was used
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:00:59 +0000 (17:00 -0400)]
intel_idle: Add KBL support
KBL is similar to SKL
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Len Brown [Wed, 6 Apr 2016 21:00:58 +0000 (17:00 -0400)]
intel_idle: Add SKX support
SKX is similar to BDX
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:57 +0000 (17:00 -0400)]
intel_idle: Clean up all registered devices on exit.
This driver registers cpuidle devices when a CPU comes online, but it
leaves the registrations in place when a CPU goes offline. The module
exit code only unregisters the currently online CPUs, leaving the
devices for offline CPUs dangling.
This patch changes the driver to clean up all registrations on exit,
even those from CPUs that are offline.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:56 +0000 (17:00 -0400)]
intel_idle: Propagate hot plug errors.
If a cpuidle registration error occurs during the hot plug notifier
callback, we should really inform the hot plug machinery instead of
just ignoring the error. This patch changes the callback to properly
return on error.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:55 +0000 (17:00 -0400)]
intel_idle: Don't overreact to a cpuidle registration failure.
The helper function, intel_idle_cpu_init, registers one new device
with the cpuidle layer. If the registration should fail, that
function immediately calls intel_idle_cpuidle_devices_uninit() to
unregister every last CPU's device. However, it makes no sense to do
so, when called from the hot plug notifier callback.
This patch moves the call to intel_idle_cpuidle_devices_uninit()
outside of the helper function to the one call site that actually
needs to perform the de-registrations.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:54 +0000 (17:00 -0400)]
intel_idle: Setup the timer broadcast only on successful driver load.
This driver sets the broadcast tick quite early on during probe and does
not clean up again in cast of failure. This patch moves the setup call
after the registration, placing the on_each_cpu() calls within the global
CPU lock region.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:53 +0000 (17:00 -0400)]
intel_idle: Avoid a double free of the per-CPU data.
The helper function, intel_idle_cpuidle_devices_uninit, frees the
globally allocated per-CPU data. However, this function is invoked
from the hot plug notifier callback at a time when freeing that data
is not safe.
If the call to cpuidle_register_driver() should fail (say, due to lack
of memory), then the driver will free its per-CPU region. On the
*next* CPU_ONLINE event, the driver will happily use the region again
and even free it again if the failure repeats.
This patch fixes the issue by moving the call to free_percpu() outside
of the helper function at the two call sites that actually need to
free the per-CPU data.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:52 +0000 (17:00 -0400)]
intel_idle: Fix dangling registration on error path.
In the module_init() method, if the per-CPU allocation fails, then the
active cpuidle registration is not cleaned up. This patch fixes the
issue by attempting the allocation before registration, and then
cleaning it up again on registration failure.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:51 +0000 (17:00 -0400)]
intel_idle: Fix deallocation order on the driver exit path.
In the module_exit() method, this driver first frees its per-CPU
pointer, then unregisters a callback making use of the pointer.
Furthermore, the function, intel_idle_cpuidle_devices_uninit, is racy
against CPU hot plugging as it calls for_each_online_cpu().
This patch corrects the issues by unregistering first on the exit path
while holding the hot plug lock.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Richard Cochran [Wed, 6 Apr 2016 21:00:50 +0000 (17:00 -0400)]
intel_idle: Remove redundant initialization calls.
The function, intel_idle_cpuidle_driver_init, makes calls on each CPU
to auto_demotion_disable() and c1e_promotion_disable(). These calls
are redundant, as intel_idle_cpu_init() does the same calls just a bit
later on. They are also premature, as the driver registration may yet
fail.
This patch removes the redundant code.
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>