git.openwrt.org Git - openwrt/staging/blogic.git/log

net: sched: kill dead code in sch_choke.c

It looks like this has never been used at all.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: Delete an unnecessary check before the function call "irlmp_unregister_service"

The irlmp_unregister_service() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2015-11-03' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Another set of fixes:
* remove a warning on a check that can trigger without any
   errors having happened (Andrei)
* correctly handle deauth request while in the process of
   associating (Andrei)
* fix TDLS HT operation (Arik)
* allow changing AID/listen interval during client setup (Ayala)
* be more forgiving with WMM parameters to get HT/VHT in case of
   broken APs with bad WMM settings (Emmanuel, myself)
* a number of other fixes (some in documentation)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: include DSA ports in VLANs

DSA ports must be members of a VLAN in order to ensure frame bridging
between chained switch chips.

Thus tag them in addition to the CPU port when adding a VLAN, and skip
them when deleting a VLAN and reporting VLAN members.

Also use the UNMODIFIED egress policy, so that frames egress on these
ports as they ingress, tagged or untagged.

Fixes: 0d3b33e60206 ("net: dsa: mv88e6xxx: add VLAN Load support")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports

Frames with DSA headers passing to/from the CPU were taking place in the
MAC learning on these ports, resulting in incorrect ATU entries. Disable
learning on these ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core: fix for_each_netdev_feature

As pointed out by Nikolay and further explained by Geert, the initial
for_each_netdev_feature macro was broken, as feature would get set outside
of the block of code it was intended to run in, thus only ever working for
the first feature bit in the mask. While less pretty this way, this is
tested and confirmed functional with multiple feature bits set in
NETIF_F_UPPER_DISABLES.

[root@dell-per730-01 ~]# ethtool -K bond0 lro off
...
[  242.761394] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[  243.552178] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[  244.353978] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[  245.147420] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

[root@dell-per730-01 ~]# ethtool -K bond0 gro off
...
[  251.925645] bond0: Disabling feature 0x0000000000004000 on lower dev p5p2.
[  252.713693] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[  253.499085] bond0: Disabling feature 0x0000000000004000 on lower dev p5p1.
[  254.290922] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: Invoke driver vlan hooks only if device is present

NIC drivers mark device as detached during error recovery.
It expects no manangement hooks to be invoked in this state.
Invoke driver vlan hooks only if device is present.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arcnet/com20020: add LEDS_CLASS dependency

The newly added led trigger support in the com20020-pci driver causes
build errors when CONFIG_LEDS_CLASS is disabled:

drivers/built-in.o: In function `com20020pci_probe':
(.text+0x185dc4): undefined reference to `devm_led_classdev_register'
(.text+0x185dd8): undefined reference to `devm_led_classdev_register'

This adds a Kconfig dependency to prevent the invalid configurations.
Other drivers appear to be split 50:50 between 'select' and 'depends on'
for this symbol, I picked 'depends on' as I could not find a common
policy and it generally causes fewer problems.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8890624a4e8c ("arcnet: com20020-pci: add led trigger support")
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, verifier: annotate verbose printer with __printf

The verbose() printer dumps the verifier state to user space, so let gcc
take care to check calls to verbose() for (future) errors. make with W=1
correctly suggests: function might be possible candidate for 'gnu_printf'
format attribute [-Wsuggest-attribute=format].

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dp83640-fixes'

Stefan Sørensen says:

====================
dp83640 driver fixes

This series fixes a number of minor bugs in the dp83640 driver.
====================

Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dp83640: Only wait for timestamps for packets with timestamping enabled.

In the packet timestamping function, check that the ptp version and
protocol of the packet matches what we have configured the hardware to
actually generate timestamps for, before looking/waiting for a timestamp.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: Change ptp_class to a proper bitmask

Change the definition of PTP_CLASS_L2 to not have any bits overlapping with
the other defined protocol values, allowing the PTP_CLASS_* definitions to
be for simple filtering on packet type.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dp83640: Prune rx timestamp list before reading from it

The list of rx timestamps are currently only pruned of old entries when a
new entry is inserted. If no new entries are added, old timestamps may
survive beyond their lifetime, possible causing them to be attached to
packets with the same sequence number after a rollover.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dp83640: Delay scheduled work.

Currently rx_timestamp_work reschedules itself as a regular workqueue item,
effectively causing it run constantly as long as there are packets left in
the queue. Fix by using delayed workqueue items, limiting it to run only
every two jiffies.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dp83640: Include hash in timestamp/packet matching

Only using the message type and sequence id for matching timestamps
with packets is error prone, as multiple clients may very well be
sending packets with the same messagetype and timestamp at the same
time. Fix by extending the check to include the hash of bytes 20-29
(source id in PTPv2) that is provided with the timestamps.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-fixes'

Or Gerlitz says:

====================
Mellanox mlx5e driver update, Nov 3 2015

This series contains bunch of small fixes to the mlx5e driver from Achiad.

Changes from V0:
- removed the driver patch that dealt with IRQ affinity changes during
NAPI poll, as this is a generic problem which needs generic solution.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Fix LSO vlan insertion

Consider vlan insertion impact on headers copy size also for LSO
packets.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Re-eanble client vlan TX acceleration

This reverts commit cd58c714acb9 "net/mlx5e: Disable client vlan TX acceleration".

Bring back client vlan insertion offload, the original
performance issue was found and fixed in the next patch.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Return error in case mlx5e_set_features() fails

In case mlx5e_set_features() fails, return the failure status rather
than 0.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Don't allow more than max supported channels

Consider MLX5E_MAX_NUM_CHANNELS @ethtool set/get_channels

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Use the the real irqn in eq->irqn

Instead of storing the msix array index in eq->irqn (vecidx),
store the real irq number.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Wait for RX buffers initialization in a more proper manner

Use jiffies rather than wait loop with msleep().

The wait loop didn't take into consideration time when the
process was not executing.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Avoid NULL pointer access in case of configuration failure

In case a configuration operation that involves closing and re-opening
resources (e.g RX/TX queue size change) fails at the re-opening stage
these resources will remain closed.
So when executing (following) configuration operations (e.g ifconfig
down) we cannot assume that these resources are available.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfg80211: allow AID/listen interval changes for unassociated station

Currently, cfg80211 rejects updates of AID and listen interval parameters
for existing entries. This information is known only at association stage
and as a result it's impossible to update entries that were added
unassociated.
Fix this by allowing updates of these properies for stations that the
driver (or mac80211) assigned unassociated state.

This then fixes mac80211's use of NL80211_FEATURE_FULL_AP_CLIENT_STATE.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: document sleep requirements for channel context ops

Channel context driver operations can sleep, so add might_sleep()
and document this.

Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: further improve "no supported rates" warning

Allow distinguishing the non-station case from the case of a
station without rates, by using -1 for the non-station case.
This value cannot be reached with a station since that many
legacy rates don't exist.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: treat bad WMM parameters more gracefully

As WMM is required for HT/VHT operation, treat bad WMM parameters
more gracefully by falling back to default parameters instead of
not using WMM assocation. This makes it possible to still use HT
or VHT, although potentially with reduced quality of service due
to unintended WMM parameters.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fixup AIFSN instead of disabling WMM

Disabling WMM has a huge impact these days. It implies that
HT and VHT will be disabled which means that the throughput
will be drammatically reduced.
Since the AIFSN is a transmission parameter, we can play a
bit and fix it up to make it compliant with the 802.11
specification which requires it to be at least 2.
Increasing it from 1 to 2 will slightly reduce the
likelyhood to get a transmission opportunity compared to
other clients that would accept to set AIFSN=1, but at
least it will allow HT and VHT which is a huge gain.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: make enable_qos parameter to ieee80211_set_wmm_default()

The function currently determines this value, for use in bss_info.qos,
based on the interface type itself. Make it a parameter instead and
set it with the same logic for now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211/mac80211: clarify RSSI CQM reporting requirements

The previous patch changed mac80211 to always report an event
after a CQM RSSI reconfiguration. Document that as expected
behaviour in both the cfg80211 and mac80211 API.

Currently, iwlmvm already implements that behaviour; the other
drivers implementing CQM RSSI events may have to be changed.

This behaviour lets userspace know what the current state is
without relying on querying the data which is racy.

Reviewed-by: Sharon, Sara <sara.sharon@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix crash on mesh local link ID generation with VIFs

llid_in_use needs to be limited to stations of the same VIF, otherwise it
will cause a NULL deref as the sta_info of non-mesh-VIFs don't have
sta->mesh set.

Steps to reproduce:

   modprobe mac80211_hwsim channels=2
   iw phy phy0 interface add ibss0 type ibss
   iw phy phy0 interface add mesh0 type mp
   iw phy phy1 interface add ibss1 type ibss
   iw phy phy1 interface add mesh1 type mp
   ip link set ibss0 up
   ip link set mesh0 up
   ip link set ibss1 up
   ip link set mesh1 up
   iw dev ibss0 ibss join foo 2412
   iw dev ibss1 ibss join foo 2412
   # Ensure that ibss0 and ibss1 are actually associated; I often need to
   # leave and join the cell on ibss1 a second time.
   iw dev mesh0 mesh join bar
   iw dev mesh1 mesh join bar # crash

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: TDLS: add proper HT-oper IE

When 11n peers performs a TDLS connection on a legacy BSS, the HT
operation IE must be specified according to IEEE802.11-2012 section
9.23.3.2. Otherwise HT-protection is compromised and the medium becomes
noisy for both the TDLS and the BSS links.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: don't reconfigure sched scan in case of wowlan

Scheduled scan has to be reconfigured only if wowlan wasn't
configured, since otherwise it should continue to run (with
the 'any' trigger) or be aborted.

The current code will end up asking the driver to start a new
scheduled scan without stopping the previous one, and leaking
some memory (from the previous request.)

Fix this by doing the abort/restart under the proper conditions.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: call drv_stop only if driver is started

If drv_start() fails during hw_restart, all the running
interfaces are being closed/stopped, which results in
drv_stop() being called, although the driver was never
started successfully.

This might cause drivers to perform operations on uninitialized
memory (as they assume it was initialized on drv_start)

Consider the local->started flag, and call the driver's stop()
op only if drv_start() succeeded before.

Move drv_start() and drv_stop() to driver-ops.c, as they are no
longer simple wrappers.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Remove WARN_ON_ONCE in ieee80211_recalc_smps

The recalc_smps work can run after the station disassociates.
At this stage we already released the channel, but the work
will be cancelled only when the interface stops.
In this scenario we can hit the warning in ieee80211_recalc_smps, so
just remove it.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: use freezable workqueue for restart work

Requesting hw restart during suspend might result
in the restart work being executed after mac80211
and the hw are suspended.

Solve the race by simply scheduling the restart
work on a freezable workqueue.

Note that there can be some cases of reconfiguration
on resume (besides the hardware restart):

* wowlan is not configured -
    All the interfaces removed were removed on suspend,
    and drv_stop() was called. At this point the driver
    shouldn't expect for hw_restart anyway, so we can
    simply cancel it (on resume).

* wowlan is configured, drv_resume() == 1
    There is no definitive expected behavior in this case,
    as each driver might have different expectations (e.g.
    setting some flags on suspend/restart vs. not handling
    spurious recovery).
    For now, simply let the hw_restart work run again after
    resume, and hope the driver will handle it well (or at
    least initiate another hw restart).

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Fix local deauth while associating

Local request to deauthenticate wasn't handled while associating, thus
the association could continue even when the user space required to
disconnect.

Cc: stable@vger.kernel.org
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: allow null chandef in tracing

In TDLS channel-switch operations the chandef can sometimes be NULL.
Avoid an oops in the trace code for these cases and just print a
chandef full of zeros.

Cc: stable@vger.kernel.org
Fixes: a7a6bdd0670fe ("mac80211: introduce TDLS channel switch ops")
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

nl80211: Fix potential memory leak from parse_acl_data

If parse_acl_data succeeds but the subsequent parsing of smps
attributes fails, there will be a memory leak due to early returns.
Fix that by moving the ACL parsing later.

Cc: stable@vger.kernel.org
Fixes: 18998c381b19b ("cfg80211: allow requesting SMPS mode on ap start")
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: fix divide by zero when NOA update

In case of one shot NOA the interval can be 0, catch that
instead of potentially (depending on the driver) crashing
like this:

divide error: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211]
[<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211]
[<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k]
[<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw]
[<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k]
[<ffffffff8107ad65>] tasklet_action+0xe5/0xf0
[...]

Cc: stable@vger.kernel.org [3.16+, due to d463af4a1c34 using it]
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

net/core: generic support for disabling netdev features down stack

There are some netdev features, which when disabled on an upper device,
such as a bonding master or a bridge, must be disabled and cannot be
re-enabled on underlying devices.

This is a rework of an earlier more heavy-handed appraoch, which simply
disables and prevents re-enabling of netdev features listed in a new
define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
device that disables a flag in that feature mask, the disabling will
propagate down the stack, and any lower device that has any upper device
with one of those flags disabled should not be able to enable said flag.

Initially, only LRO is included for proof of concept, and because this
code effectively does the same thing as dev_disable_lro(), though it will
also activate from the ethtool path, which was one of the goals here.

[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: off
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: off

dmesg dump:

[ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83
[ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71

This has been successfully tested with bnx2x, qlcnic and netxen network
cards as slaves in a bond interface. Turning LRO on or off on the master
also turns it on or off on each of the slaves, new slaves are added with
LRO in the same state as the master, and LRO can't be toggled on the
slaves.

Also, this should largely remove the need for dev_disable_lro(), and most,
if not all, of its call sites can be replaced by simply making sure
NETIF_F_LRO isn't included in the relevant device's feature flags.

Note that this patch is driven by bug reports from users saying it was
confusing that bonds and slaves had different settings for the same
features, and while it won't be 100% in sync if a lower device doesn't
support a feature like LRO, I think this is a good step in the right
direction.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: fix typo in RX descriptor bit name

The correct name of the RX descriptor 0 bit 30 is RDLE (receive descriptor
list end), not RDEL.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding-actor-updates'

Mahesh Bandewar says:

====================
re-org actor admin/oper key updates

I was observing machines entering into weird LACP state when the
partner is in passive mode. This issue is not because of the partners
in passive state but probably because of some operational key update
which is pushing the state-machine is that weird state. This was
happening randomly on about 1% of the machine (when the sample size
is a large set of machines with a variety of NICs/ports bonded).

In this patch-series I'm attempting to unify the logic of actor-key
/ operational-key changes to one place to avoid possible errors in
update. Also this eliminates the need for the event-handler to decide
if the key needs update.

After this patch-set none of the machines (from same sample set) were
exhibiting LACP-weirdness that was observed earlier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: simplify / unify event handling code for 3ad mode.

Old logic of updating state-machine is not required since
ad_update_actor_keys() does it implicitly. The only loss is
the notification differentiation between speed vs. duplex
change. Now only one unified notification is printed.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: unify all places where actor-oper key needs to be updated.

actor_admin, and actor_oper key is changed at multiple locations in
the code. This patch brings all those updates into one location in
an attempt to avoid possible inconsistent updates causing LACP state
machine to go in weird state.

The unified place is ad_update_actor_key() with simple state-machine
logic -
(a) If port is "duplex" then only it can participate in LACP
(b) Speed change reinitializes the LACP state-machine.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Simplify __get_duplex function.

Eliminate 'else' clause by simply initializing variable

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-persistent'

Daniel Borkmann says:

====================
BPF updates

This set adds support for persistent maps/progs. Please see
individual patches for further details. A man-page update
to bpf(2) will be sent later on, also a iproute2 patch for
support in tc.

v1 -> v2:
- Reworked most of patch 4 and 5
- Rebased to latest net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add sample usages for persistent maps/progs

This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
and BPF_OBJ_GET commands can be used.

Example with maps:

  # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
  bpf: map fd:3 (Success)
  bpf: pin ret:(0,Success)
  bpf: fd:3 u->(1:42) ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):42 ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
  bpf: get fd:3 (Success)
  bpf: fd:3 u->(1:24) ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):24 ret:(0,Success)

  # ./fds_example -F /sys/fs/bpf/m2 -P -m
  bpf: map fd:3 (Success)
  bpf: pin ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
  bpf: get fd:3 (Success)
  bpf: fd:3 l->(1):0 ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/m2 -G -m
  bpf: get fd:3 (Success)

Example with progs:

  # ./fds_example -F /sys/fs/bpf/p -P -p
  bpf: prog fd:3 (Success)
  bpf: pin ret:(0,Success)
  bpf sock:4 <- fd:3 attached ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/p -G -p
  bpf: get fd:3 (Success)
  bpf: sock:4 <- fd:3 attached ret:(0,Success)

  # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
  bpf: prog fd:5 (Success)
  bpf: pin ret:(0,Success)
  bpf: sock:3 <- fd:5 attached ret:(0,Success)
  # ./fds_example -F /sys/fs/bpf/p2 -G -p
  bpf: get fd:3 (Success)
  bpf: sock:4 <- fd:3 attached ret:(0,Success)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add support for persistent maps/progs

This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.

Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.

The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.

We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.

The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.

The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.

Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.

Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: consolidate bpf_prog_put{, _rcu} dismantle paths

We currently have duplicated cleanup code in bpf_prog_put() and
bpf_prog_put_rcu() cleanup paths. Back then we decided that it was
not worth it to make it a common helper called by both, but with
the recent addition of resource charging, we could have avoided
the fix in commit ac00737f4e81 ("bpf: Need to call bpf_prog_uncharge_memlock
from bpf_prog_put") if we would have had only a single, common path.
We can simplify it further by assigning aux->prog only once during
allocation time.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: align and clean bpf_{map,prog}_get helpers

Add a bpf_map_get() function that we're going to use later on and
align/clean the remaining helpers a bit so that we have them a bit
more consistent:

  - __bpf_map_get() and __bpf_prog_get() that both work on the fd
    struct, check whether the descriptor is eBPF and return the
    pointer to the map/prog stored in the private data.

    Also, we can return f.file->private_data directly, the function
    signature is enough of a documentation already.

  - bpf_map_get() and bpf_prog_get() that both work on u32 user fd,
    call their respective __bpf_map_get()/__bpf_prog_get() variants,
    and take a reference.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: abstract anon_inode_getfd invocations

Since we're going to use anon_inode_getfd() invocations in more than just
the current places, make a helper function for both, so that we only need
to pass a map/prog pointer to the helper itself in order to get a fd. The
new helpers are called bpf_map_new_fd() and bpf_prog_new_fd().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix percpu memory leaks

This patch fixes following problems :

1) percpu_counter_init() can return an error, therefore
  init_frag_mem_limit() must propagate this error so that
  inet_frags_init_net() can do the same up to its callers.

2) If ip[46]_frags_ns_ctl_register() fail, we must unwind
   properly and free the percpu_counter.

Without this fix, we leave freed object in percpu_counters
global list (if CONFIG_HOTPLUG_CPU) leading to crashes.

This bug was detected by KASAN and syzkaller tool
(http://github.com/google/syzkaller)

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: use pdev rather than ndev for error messages

This corrects what appear to be typos, making the code consistent with
itself, and allowing meaningful prefixes to be displayed with the errors in
question.

Before:
(null): failed to initialize MDIO
(null): Cannot allocate desc base address table (size 176 bytes)

After:
ravb e6800000.ethernet: failed to initialize MDIO
ravb e6800000.ethernet: Cannot allocate desc base address table (size 176 bytes)

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make skb_set_owner_w() more robust

skb_set_owner_w() is called from various places that assume
skb->sk always point to a full blown socket (as it changes
sk->sk_wmem_alloc)

We'd like to attach skb to request sockets, and in the future
to timewait sockets as well. For these kind of pseudo sockets,
we need to take a traditional refcount and use sock_edemux()
as the destructor.

It is now time to un-inline skb_set_owner_w(), being too big.

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Use rcu_dereference instead of rtnl_dereference

br_should_learn() is protected by RCU and not by RTNL, so use correct
flavor of nbp_vlan_group().

Fixes: 907b1e6e83ed ("bridge: vlan: use proper rcu for the vlgrp
member")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: lookup switch name

All the mv88e6xxx drivers use the exact same code in their probe
function to lookup the switch name given its ID. Thus introduce a
mv88e6xxx_switch_id structure and a mv88e6xxx_lookup_name function in
the common mv88e6xxx code.

In the meantime make __mv88e6xxx_reg_{read,write} static since we do not
need to expose these low-level r/w routines anymore.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: assert SMI lock

It's easy to forget to lock the smi_mutex before calling the low-level
_mv88e6xxx_reg_{read,write}, so add a assert_smi_lock function in them.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: convert hashtab lock to raw lock

When running bpf samples on rt kernel, it reports the below warning:

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping
Preemption disabled at:[<ffff80000017db58>] kprobe_perf_func+0x30/0x228

CPU: 3 PID: 477 Comm: ping Not tainted 4.1.10-rt8 #4
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffff80000008a5b0>] dump_backtrace+0x0/0x128
[<ffff80000008a6f8>] show_stack+0x20/0x30
[<ffff8000007da90c>] dump_stack+0x7c/0xa0
[<ffff8000000e4830>] ___might_sleep+0x188/0x1a0
[<ffff8000007e2200>] rt_spin_lock+0x28/0x40
[<ffff80000018bf9c>] htab_map_update_elem+0x124/0x320
[<ffff80000018c718>] bpf_map_update_elem+0x40/0x58
[<ffff800000187658>] __bpf_prog_run+0xd48/0x1640
[<ffff80000017ca6c>] trace_call_bpf+0x8c/0x100
[<ffff80000017db58>] kprobe_perf_func+0x30/0x228
[<ffff80000017dd84>] kprobe_dispatcher+0x34/0x58
[<ffff8000007e399c>] kprobe_handler+0x114/0x250
[<ffff8000007e3bf4>] kprobe_breakpoint_handler+0x1c/0x30
[<ffff800000085b80>] brk_handler+0x88/0x98
[<ffff8000000822f0>] do_debug_exception+0x50/0xb8
Exception stack(0xffff808349687460 to 0xffff808349687580)
7460: 4ca2b600 ffff8083 4a3a7000 ffff8083 49687620 ffff8083 0069c5f8 ffff8000
7480: 00000001 00000000 007e0628 ffff8000 496874b0 ffff8083 007e1de8 ffff8000
74a0: 496874d0 ffff8083 0008e04c ffff8000 00000001 00000000 4ca2b600 ffff8083
74c0: 00ba2e80 ffff8000 49687528 ffff8083 49687510 ffff8083 000e5c70 ffff8000
74e0: 00c22348 ffff8000 00000000 ffff8083 49687510 ffff8083 000e5c74 ffff8000
7500: 4ca2b600 ffff8083 49401800 ffff8083 00000001 00000000 00000000 00000000
7520: 496874d0 ffff8083 00000000 00000000 00000000 00000000 00000000 00000000
7540: 2f2e2d2c 33323130 00000000 00000000 4c944500 ffff8083 00000000 00000000
7560: 00000000 00000000 008751e0 ffff8000 00000001 00000000 124e2d1d 00107b77

Convert hashtab lock to raw lock to avoid such warning.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge_vlan_fixes'

Nikolay Aleksandrov says:

====================
bridge: vlan: failure path and comment fixes

This is a set from Ido which takes care of one failure path error in
nbp_vlan_init (patch 1) and a few comment errors (patch 2).
I must admit I didn't expect the port init continues after a vlan init
failure but should've checked to make sure. Thanks to Ido for catching
these!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Use correct flag name in comment

The flag used to indicate if a VLAN should be used for filtering - as
opposed to context only - on the bridge itself (e.g. br0) is called
'brentry' and not 'brvlan'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: vlan: Prevent possible use-after-free

When adding a port to a bridge we initialize VLAN filtering on it. We do
not bail out in case an error occurred in nbp_vlan_init, as it can be
used as a non VLAN filtering bridge.

However, if VLAN filtering is required and an error occurred in
nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering
functions (e.g. br_vlan_find, br_get_pvid) will know the struct is
invalid and will not try to access it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: fix ireq->pktopts race

IPv6 request sockets store a pointer to skb containing the SYN packet
to be able to transfer it to full blown socket when 3WHS is done
(ireq->pktopts -> np->pktoptions)

As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for
passive sessions"), we must transfer the skb only if we won the
hashdance race, if multiple cpus receive the 'ack' packet completing
3WHS at the same time.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: convert bind hash table to re-sizable hashtable

To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Reviewed-by: David Miller <davem@davemloft.net>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rds: changing the return type from int to void

as result of function rds_iw_flush_mr_pool is nowhere checked,
changing its return type from int to void.
also removing the unused variable rc as there is nothing to return

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'encx24j600-fixes'

Javier Martinez Canillas says:

====================
net: encx24j600: Fix SPI driver module autoload

Recently I've been trying to fix module autoloading for all SPI drivers and
found that the encx24j600 driver does not fill module alias information due
missing a MODULE_DEVICE_TABLE() so module autload won't work and the driver
Kconfig symbol is tristate which means the driver can be built as a module.

But also the SPI id table is not correctly defined so this series fixes both
issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: encx24j600: Export missing SPI module alias information

The driver Kconfig symbol is tristate which means that it can be built as
a module but the module alias information is not added to the module info
so module autoload won't work since user-space won't have the information.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: encx24j600: Fix SPI id table definition

A driver's SPI id table is expected to be an array of struct spi_device_id
that ends with a zero-initialized sentinel entry. But this driver defines
the table as a single struct spi_device_id and sets .id_table to a pointer
to this struct.

But spi_match_id() has a loop that iterates while the struct spi_device_id
.name[0] is not NULL, so not having a sentinel can cause a NULL pointer
deference error.

This patch defines the SPI id table correctly as all other SPI drivers do.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: assign affinity hint to interrupts

The affinity hint is used by the user space daemon, irqbalancer, to
indicate a preferred CPU mask for irqs. This patch sets the irq affinity
hint to local numa core first, when exausted we try non-local numa cores.

Also set tx xps cpus mask bassed on affinity hint.

v2: remove the global affinity policy.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: use l4 hash for locally generated multipath flows

This patch changes how the multipath hash is computed for locally
generated flows: now the hash comprises l4 information.

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

v2 changes:
- use get_hash_from_flowi4() instead of implementing just another l4 hash
function

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Use 64-bit timekeeping

This patch changes the use of struct timespec in
dccp_probe to use struct timespec64 instead. timespec uses a 32-bit
seconds field which will overflow in the year 2038 and beyond. timespec64
uses a 64-bit seconds field. Note that the correctness of the code isn't
changed, since the original code only uses the timestamps to compute a
small elapsed interval. This patch is part of a larger attempt to remove
instances of 32-bit timekeeping structures (timespec, timeval, time_t)
from the kernel so it is easier to identify where the real 2038 issues
are.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hisilicon: Remove .owner assignment from platform_driver

platform_driver doesn't need to set .owner, because
platform_driver_register() will set it.

Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: use switchdev obj for VLAN add/del ops

Simplify DSA by pushing the switchdev objects for VLAN add and delete
operations down to its drivers. Currently only mv88e6xxx is affected.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: define VSOCK_SS_LISTEN once only

The SS_LISTEN socket state is defined by both af_vsock.c and
vmci_transport.c.  This is risky since the value could be changed in one
file and the other would be out of sync.

Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part
of enum socket_state (SS_CONNECTED, ...).  This way it is clear that the
constant is vsock-specific.

The big text reflow in af_vsock.c was necessary to keep to the maximum
line length.  Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'csum_partial_frags'

Hannes Frederic Sowa says:

====================
net: clean up interactions of CHECKSUM_PARTIAL and fragmentation

This series fixes wrong checksums on the wire for IPv4 and IPv6. Large
send buffers and especially NFS lead to wrong checksums in both IPv4
and IPv6.

CHECKSUM_PARTIAL skbs should not receive the respective fragmentations
functions, so we add WARN_ON_ONCE to those functions to fix up those as
soon as they get reported.

Changelog:
v2: added v4 checks
v3: removed WARN_ON_ONCES (advice by Tom Herbert)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull Ceph fix from Sage Weil:
"This sets the stable pages flag on the RBD block device when we have
  CRCs enabled.  (This is necessary since the default assumption for
  block devices changed in 3.9)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: require stable pages if message data CRCs are enabled

Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs bug fixes from Miklos Szeredi:
"This contains fixes for bugs that appeared in earlier kernels (all are
  marked for -stable)"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: free lower_mnt array in ovl_put_super
  ovl: free stack of paths in ovl_fill_super
  ovl: fix open in stacked overlay
  ovl: fix dentry reference leak
  ovl: use O_LARGEFILE in ovl_copy_up()

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix two regressions in ipv6 route lookups, particularly wrt output
    interface specifications in the lookup key.  From David Ahern.

2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from
    Herbert Xu.

3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon
    Arlott.

4) Some smsc phys misbehave with energy detect mode enabled, so add a
    DT property and disable it on such switches.  From Heiko Schocher.

5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer.

6) Fix regression added by removal of openvswitch vport stats, from
    James Morse.

7) Vendor Kconfig options should be bool, not tristate, from Andreas
    Schwab.

8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we
    barf during TCP REPAIR operations.

9) Fix various bugs in openvswitch conntrack support, from Joe
    Stringer.

10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann.

11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen
    Hansen.

12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten
    Keil.

13) Add some device IDs to qmi_wwan, from Bjorn Mork.

14) Fix ovs egress tunnel information when using lwtunnel devices, from
    Pravin B Shelar.

15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason
    Wang.

16) Fix incorrect handling of throw routes when the result of the throw
    cannot find a match, from Xin Long.

17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic
    Sowa.

18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan
    Sullivan.

19) Add missing memory barries in descriptor accesses or xgbe driver,
    from Thomas Lendacky.

20) Fix release conditon test in pppoe_release(), from Guillaume Nault.

21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil.

22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei
    Shtylyov.

23) Fixing missing of_node_put() calls in various places around the
    networking, from Julia Lawall.

24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander
    Duyck.

25) RDS doesn't check pskb_pull()/pskb_trim() return values, from
    Sowmini Varadhan.

26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
  Revert "Merge branch 'ipv6-overflow-arith'"
  net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes
  net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present
  vhost: fix performance on LE hosts
  bpf: sample: define aarch64 specific registers
  amd-xgbe: Fix race between access of desc and desc index
  RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
  forcedeth: fix unilateral interrupt disabling in netpoll path
  openvswitch: Fix skb leak using IPv6 defrag
  ipv6: Export nf_ct_frag6_consume_orig()
  openvswitch: Fix double-free on ip_defrag() errors
  fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
  net: mv643xx_eth: add missing of_node_put
  ath6kl: add missing of_node_put
  net: phy: mdio: add missing of_node_put
  netdev/phy: add missing of_node_put
  net: netcp: add missing of_node_put
  net: thunderx: add missing of_node_put
  ipv6: gre: support SIT encapsulation
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input layer fixes from Dmitry Torokhov:

- a change to the ALPS driver where we had limit the quirk for
   trackstick handling from being active on all Dells to just a few
   models

- a fix for a build dependency issue in the sur40 driver

- a small clock handling fixup in the LPC32xx touchscreen driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits
  Input: sur40 - add dependency on VIDEO_V4L2
  Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock

Merge tag 'pci-v4.3-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
"Sorry for this last-minute update; it's been in -next for quite a
  while, but I forgot about it until I started getting ready for the
  merge window.

  It's small and fixes a way a user could cause a panic via sysfs, so I
  think it's worth getting it in v4.3.

  NUMA:
    - Prevent out of bounds access in sysfs numa_node override (Sasha Levin)"

* tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Prevent out of bounds access in numa_node override

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Apologies for this being so late, but we've uncovered a few nasty
  issues on arm64 which didn't settle down until yesterday and the fixes
  all look suitable for 4.3.  Of the four patches, three of them are
  Cc'd to stable, with the remaining patch fixing an issue that only
  took effect during the merge window.

  Summary:

   - Fix corruption in SWP emulation when STXR fails due to contention
   - Fix MMU re-initialisation when resuming from a low-power state
   - Fix stack unwinding code to match what ftrace expects
   - Fix relocation code in the EFI stub when DRAM base is not 2MB aligned"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/efi: do not assume DRAM base is aligned to 2 MB
  Revert "ARM64: unwind: Fix PC calculation"
  arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
  arm64: compat: fix stxr failure case in SWP emulation

Merge tag 'please-pull-syscalls' of git://git./linux/kernel/git/aegl/linux

Pull ia64 kcmp syscall from Tony Luck:
"Missed adding the kcmp() syscall a long time ago.  Now it seems that
  it is essential to build systemd"

* tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Wire up kcmp syscall

rbd: require stable pages if message data CRCs are enabled

rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.

But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.

This patch sets the necessary flag to get stable pages back for rbd.

In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).

Cc: stable@vger.kernel.org # 3.9+, needs backporting
Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de>
[idryomov@gmail.com: require stable pages only in crc case, changelog]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows.

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2015-10-29' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
iwlwifi

* bug fix for TDLS
* fixes and cleanups in scan
* support of several scan plans
* improvements in FTM
* fixes in FW API
* improvements in the failure paths when the bus is dead
* other various small things here and there

ath10k

* add QCA9377 support
* fw_stats support for 10.4 firmware

ath6kl

* report antenna configuration to user space
* implement ethtool stats

ssb

* add Kconfig SSB_HOST_SOC for compiling SoC related code
* move functions specific to SoC hosted bus to separated file
* pick PCMCIA host code support from b43 driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: fix: pass correct obj size when deferring obj add

Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: fix: erasing too much of vlan obj when handling multiple vlan specs

When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC,
we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO.
Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO
iteration.

Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'nfc-next-4.4-2' of git://git./linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.4 pull request

This is the NFC pull request for 4.4.

It's a bit bigger than usual, the 3 main culprits being:

- A new driver for Intel's Fields Peak NCI chipset. In order to
  support this chipset we had to export a few NCI routines and
  extend the driver NCI ops to not only support proprietary
  commands but also core ones.

- Support for vendor commands for both STM drivers, st-nci
  and st21nfca. Those vendor commands allow to run factory tests
  through the NFC netlink interface.

- New i2c and SPI support for the Marvell driver, together with
  firmware download support for this driver's core.

Besides that we also have:

- A few file renames in the STM drivers, to keep the naming
  consistent between drivers.

- Some improvements and fixes on the NCI HCI layer, mostly to
  properly reach a secure element over a legacy HCI link.

- A few fixes for the s3fwrn5 and trf7970a drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-10-28

Here are a some more Bluetooth patches for 4.4 which collected up during
the past week. The most important ones are from Kuba Pawlak for fixing
locking issues with SCO sockets. There's also a fix from Alexander Aring
for 6lowpan, a memleak fix from Julia Lawall for the btmrvl driver and
some cleanup patches from Marcel.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU

This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-flooding-and-cosmetics'

Jiri Pirko says:

====================
mlxsw: driver update

This driver update mainly brings support for user to be able to setup
flooding on specified port, via bridge flag. Also, there is a fix in ageing
time conversion. The rest is just cosmetics.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Make mlxsw_sp_port_switchdev_ops static

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Put braces on all arms of branch statement

Fix a place where checkpatch complains that braces should be used
on all arms of this statement.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Put constant on the right side of comparisons

Fixes those places where checkpatch complains that comparisons
should place the constant on the right side of the test.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Fix ageing time value

The value passed through switchdev attr set is not in jiffies, but in
clock_t, so fix the convert.

Reported-by: Sagi Rotem <sagir@mellanox.com>
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>