git.openwrt.org Git - openwrt/staging/blogic.git/log

NFC: st-nci: i2c: Drop two useless checks in ACPI probe path

When st_nci_i2c_acpi_request_resources() gets called we already
know that the entries in ->acpi_match_table have matched ACPI ID
of the device.
In addition I2C client pointer cannot be NULL in any case
(otherwise I2C core would not call ->probe() for the driver in
the first place).

Drop the two useless checks from the driver.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

NFC: st21nfca: Drop two useless checks in ACPI probe path

When st21nfca_hci_i2c_acpi_request_resources() gets called we
already know that the entries in ->acpi_match_table have matched
ACPI ID of the device.
In addition I2C client pointer cannot be NULL in any case
(otherwise I2C core would not call ->probe() for the driver in
the first place).

Drop the two useless checks from the driver.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st21nfca: A APDU_READER_GATE pipe is unexpected on a UICC

An APDU_READER_GATE pipe is not expected on a UICC. Be more
explicit so that an other secure element form factor (SD card)
does not prompt this message.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st-nci: A APDU_READER_GATE pipe is unexpected on a UICC

An APDU_READER_GATE pipe is not expected on a UICC. Be more
explicit so that an other secure element form factor (SD card)
does not prompt this message.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st-nci: Simplify white list building

Simplify white list Building

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st21nfca: Simplify white list building

Simplify white list Building

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st-nci: set is_ese_present and is_uicc_present properly

When they're present, set is_ese_present and set is_uicc_present
to the value describe in their package description.

So far is_ese_present and is_uicc_present was set to true if their
property was present.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st21nfca: set is_ese_present and is_uicc_present properly

When they're present, set is_ese_present and set is_uicc_present
to the value describe in their package description.

So far is_ese_present and is_uicc_present was set to true if their
property was present.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st21nfca: i2c: Change ST21NFCA_GPIO_NAME_RESET to match DT

Since
commit 10cf4899f8af ("gpiolib: tighten up ACPI legacy gpio lookups")

If _DSD properties are available in an ACPI node, we are not
allowed to fallback to _CRS data to retrieve gpio properties.
This was causing us to fail if uicc-present and/or ese-present
are defined.

To be consistent with devicetree change ST_NCI_GPIO_NAME_RESET
content to reset so that acpi_find_gpio in drivers/gpio/gpiolib.c
will look for reset-gpios. In the mean time the ACPI table needs
to be fixed as follow:

Device (NFC1)
{
    Name (_ADR, Zero)  // _ADR: Address
    Name (_HID, "SMO2100")  // _HID: Hardware ID
    Name (_CID, "SMO2100")  // _CID: Compatible ID
    Name (_DDN, "SMO NFC")  // _DDN: DOS Device Name
    Name (_UID, One)  // _UID: Unique ID
    Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
    {
        Name (SBUF, ResourceTemplate ()
        {
             I2cSerialBus (0x0008, ControllerInitiated, 400000,
                           AddressingMode7Bit, "\\_SB.I2C7",
                           0x00, ResourceConsumer, ,)
             GpioInt (Edge, ActiveHigh, ExclusiveAndWake, PullNone, 0x0000,
                      "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0001
             }
             GpioIo (Exclusive, PullDefault, 0x0000, 0x0000, IoRestrictionOutputOnly,
                     "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0002,
             }
        })
        Name (_DSD, Package (0x02)
        {
             ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device Properties for _DSD */,
             Package (0x03)
             {
                 Package (0x02) { "uicc-present", 1 },
                 Package (0x02) { "ese-present", 1 },
                 Package (0x02) { "enable-gpios", Package(0x04) { ^NFC1, 1, 0, 0} },
             }
        })
        Return (SBUF) /* \_SB_.I2C7.NFC1._CRS.SBUF */
    }
    Method (_STA, 0, NotSerialized)  // _STA: Status
    {
        Return (0x0F)
    }
}

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st-nci: spi: Change ST_NCI_GPIO_NAME_RESET to match DT

Since
commit 10cf4899f8af ("gpiolib: tighten up ACPI legacy gpio lookups")

If _DSD properties are available in an ACPI node, we are not
allowed to fallback to _CRS data to retrieve gpio properties.
This was causing us to fail if uicc-present and/or ese-present
are defined.

To be consistent with devicetree change ST_NCI_GPIO_NAME_RESET
content to reset so that acpi_find_gpio in drivers/gpio/gpiolib.c
will look for reset-gpios. In the mean time the ACPI table needs
to be fixed as follow (Tested on Minnowboard Max):

Device (NFC1)
{
    Name (_ADR, Zero)  // _ADR: Address
    Name (_HID, "SMO2101")  // _HID: Hardware ID
    Name (_CID, "SMO2101")  // _CID: Compatible ID
    Name (_DDN, "SMO NFC")  // _DDN: DOS Device Name
    Name (_UID, One)  // _UID: Unique ID
    Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
    {
        Name (SBUF, ResourceTemplate ()
        {
             SpiSerialBus (0, PolarityLow, FourWireMode, 8,
                           ControllerInitiated, 4000000, ClockPolarityLow,
                           ClockPhaseFirst, "\\_SB.SPI1",
                           0x00, ResourceConsumer, ,)
             GpioInt (Edge, ActiveHigh, ExclusiveAndWake, PullNone, 0x0000,
                      "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0001
             }
             GpioIo (Exclusive, PullDefault, 0x0000, 0x0000, IoRestrictionOutputOnly,
                     "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0002,
             }
        })
        Name (_DSD, Package (0x02)
        {
             ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device Properties for _DSD */,
             Package (0x03)
             {
                 Package (0x02) { "uicc-present", 1 },
                 Package (0x02) { "ese-present", 1 },
                 Package (0x02) { "reset-gpios", Package(0x04) { ^NFC1, 1, 0, 0} },
             }
        })
        Return (SBUF) /* \_SB_.SPI1.NFC1._CRS.SBUF */
    }
    Method (_STA, 0, NotSerialized)  // _STA: Status
    {
        Return (0x0F)
    }
}

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st-nci: i2c: Change ST_NCI_GPIO_NAME_RESET to match DT

Since
commit 10cf4899f8af ("gpiolib: tighten up ACPI legacy gpio lookups")

If _DSD properties are available in an ACPI node, we are not
allowed to fallback to _CRS data to retrieve gpio properties.
This was causing us to fail if uicc-present and/or ese-present
are defined.

To be consistent with devicetree change ST_NCI_GPIO_NAME_RESET
content to reset so that acpi_find_gpio in drivers/gpio/gpiolib.c
will look for reset-gpios. In the mean time the ACPI table needs
to be fixed as follow:

Device (NFC1)
{
    Name (_ADR, Zero)  // _ADR: Address
    Name (_HID, "SMO2101")  // _HID: Hardware ID
    Name (_CID, "SMO2101")  // _CID: Compatible ID
    Name (_DDN, "SMO NFC")  // _DDN: DOS Device Name
    Name (_UID, One)  // _UID: Unique ID
    Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
    {
        Name (SBUF, ResourceTemplate ()
        {
             I2cSerialBus (0x0008, ControllerInitiated, 400000,
                           AddressingMode7Bit, "\\_SB.I2C7",
                           0x00, ResourceConsumer, ,)
             GpioInt (Edge, ActiveHigh, ExclusiveAndWake, PullNone, 0x0000,
                      "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0001
             }
             GpioIo (Exclusive, PullDefault, 0x0000, 0x0000, IoRestrictionOutputOnly,
                     "\\_SB.GPO2", 0x00, ResourceConsumer, ,)
             {   // Pin list
                 0x0002,
             }
        })
        Name (_DSD, Package (0x02)
        {
             ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device Properties for _DSD */,
             Package (0x03)
             {
                 Package (0x02) { "uicc-present", 1 },
                 Package (0x02) { "ese-present", 1 },
                 Package (0x02) { "reset-gpios", Package(0x04) { ^NFC1, 1, 0, 0} },
             }
        })
        Return (SBUF) /* \_SB_.I2C7.NFC1._CRS.SBUF */
    }
    Method (_STA, 0, NotSerialized)  // _STA: Status
    {
        Return (0x0F)
    }
}

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: st21nfca: Fix static checker warning

Fix static checker warning:
drivers/nfc/st21nfca/i2c.c:530 st21nfca_hci_i2c_acpi_request_resources()
error: 'gpiod_ena' dereferencing possible ERR_PTR()

Fix so that if no enable gpio can be retrieved an -ENODEV is returned.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: dfa8070d7f64 ("nfc: st21nfca: Add support for acpi probing for i2c device.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: Drop owner assignment from i2c_driver

i2c_driver does not need to set an owner because i2c_register_driver()
will set it.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

nfc: pn533: Add device tree documentation for i2c phy

Add pn533-i2c phy devicetree documentation

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

NFC: pn533: add I2C phy driver

This adds the I2C phy interface for the pn533 driver.
This way the driver can be used to interact with I2C
connected pn532 devices.

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

NFC: pn533: Separate physical layer from the core implementation

The driver now has all core stuff isolated in one file, and all
the hardware link specifics in another. Writing a pn533 driver
on top of another hardware link is now just a matter of adding a
new file for that new hardware specifics.

The first user of this separation will be the i2c based pn532
driver that reuses pn533 core implementation on top of an i2c
layer.

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

NFC: pn533: Fix socket deadlock

A deadlock can occur when the NFC raw socket is closed while
the driver is processing a command.

Following is the call graph of the affected situation:

send data via raw_sock:
-------------
rawsock_tx_work
  sock_hold => socket refcnt++
  nfc_data_exchange => cb = rawsock_data_exchange_complete

    ops->im_transceive = pn533_transceive => arg->cb = db
                               = rawsock_data_exchange_complete

      pn533_send_data_async => cb = pn533_data_exchange_complete

        __pn533_send_async => cmd->complete_cb = cb
                              = pn533_data_exchange_complete

          if_ops->send_frame_async

response:
--------
pn533_recv_response
  queue_work(priv->wq, &priv->cmd_complete_work)

pn533_wq_cmd_complete

  pn533_send_async_complete

    cmd->complete_cb() = pn533_data_exchange_complete()

      arg->cb() = rawsock_data_exchange_complete()

        sock_put => socket refcnt-- => If the corresponding
                    socket gets closed in the meantime socket
                    will be destructed

          sk_free

            __sk_free

              sk->sk_destruct = rawsock_destruct

                nfc_deactivate_target

                  ops->deactivate_target = pn533_deactivate_target

                    pn533_send_cmd_sync

                      pn533_send_cmd_async

                        __pn533_send_async

                          list_add_tail(&cmd->queue,&dev->cmd_queue)
                                  => add to command list because
                                     a command is currently
                                     processed

                        wait_for_completion
                                   => the workqueue thread waits
                                      here because it is the one
                                      processing the commands
                                         => deadlock

To fix the deadlock pn533_deactivate_target is changed to
issue the PN533_CMD_IN_RELEASE command in async mode. This
way nothing blocks and the release command is executed after
the current command.

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

NFC: pn533: Send ATR_REQ only if NFC_PROTO_NFC_DEP bit is set

Currently it is not possible to only poll for passive targets
with the pn533 driver. To change this ATR_REQ is only sent when
NFC_PROTO_NFC_DEP is explicitly requested in poll_protocols.
As most implementations (e.g. neard) poll for all protocols
that are reported to be supported by the adapter, this should
not have much of an effect on current implementations.

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

ipv6: fix inet6_lookup_listener()

A stupid refactoring bug in inet6_lookup_listener() needs to be fixed
in order to get proper SO_REUSEPORT behavior.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Enable use of multiple tx/rx scrqs

Enables the use of multiple transmit and receive scrqs allowing the ibmvnic
driver to take advantage of multiqueue functionality. To achieve this, the
driver must implement the process of negotiating the maximum number of
queues allowed by the server. Initially, the driver will attempt to login
with the maximum number of tx and rx queues supported by the server. If
the server fails to allocate the requested number of scrqs, it will return
partial success in the login response. In this case, we must reinitiate
the login process from the request capabilities stage and attempt to login
requesting fewer scrqs.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-voidify-ops'

Vivien Didelot says:

====================
net: dsa: voidify STP setter and FDB/VLAN add ops

Neither the DSA layer nor the bridge code (see br_set_state) really care
about eventual errors from STP state setters, so make it void.

The DSA layer separates the prepare and commit phases of switchdev in
two different functions. Logical errors must not happen in commit
routines, so make them void.

Changes v1 -> v2:
  - rename port_stp_update to port_stp_state_set
  - don't change code flow of bcm_sf2_sw_br_set_stp_state
  - prefer netdev_err over netdev_warn
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make the VLAN add function return void

The switchdev design implies that a software error should not happen in
the commit phase since it must have been previously reported in the
prepare phase. If an hardware error occurs during the commit phase,
there is nothing switchdev can do about it.

The DSA layer separates port_vlan_prepare and port_vlan_add for
simplicity and convenience. If an hardware error occurs during the
commit phase, there is no need to report it outside the driver itself.

Make the DSA port_vlan_add routine return void for explicitness.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make the FDB add function return void

The switchdev design implies that a software error should not happen in
the commit phase since it must have been previously reported in the
prepare phase. If an hardware error occurs during the commit phase,
there is nothing switchdev can do about it.

The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity
and convenience. If an hardware error occurs during the commit phase,
there is no need to report it outside the DSA driver itself.

Make the DSA port_fdb_add routine return void for explicitness.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make the STP state function return void

The DSA layer doesn't care about the return code of the port_stp_update
routine, so make it void in the layer and the DSA drivers.

Replace the useless dsa_slave_stp_update function with a
dsa_slave_stp_state function used to reply to the switchdev
SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute.

In the meantime, rename port_stp_update to port_stp_state_set to
explicit the state change.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: document missing functions

Add description for the missing port_vlan_prepare, port_fdb_prepare,
port_fdb_dump functions in the DSA documentation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For the 4.7 cycle, we have a number of changes:
* Bob's mesh mode rhashtable conversion, this includes
the rhashtable API change for allocation flags
* BSSID scan, connect() command reassoc support (Jouni)
* fast (optimised data only) and support for RSS in mac80211 (myself)
* various smaller changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, verifier: further improve search pruning

The verifier needs to go through every path of the program in
order to check that it terminates safely, which can be quite a
lot of instructions that need to be processed f.e. in cases with
more branchy programs. With search pruning from f1bca824dabb ("bpf:
add search pruning optimization to verifier") the search space can
already be reduced significantly when the verifier detects that
a previously walked path with same register and stack contents
terminated already (see verifier's states_equal()), so the search
can skip walking those states.

When working with larger programs of > ~2000 (out of max 4096)
insns, we found that the current limit of 32k instructions is easily
hit. For example, a case we ran into is that the search space cannot
be pruned due to branches at the beginning of the program that make
use of certain stack space slots (STACK_MISC), which are never used
in the remaining program (STACK_INVALID). Therefore, the verifier
needs to walk paths for the slots in STACK_INVALID state, but also
all remaining paths with a stack structure, where the slots are in
STACK_MISC, which can nearly double the search space needed. After
various experiments, we find that a limit of 64k processed insns is
a more reasonable choice when dealing with larger programs in practice.
This still allows to reject extreme crafted cases that can have a
much higher complexity (f.e. > ~300k) within the 4096 insns limit
due to search pruning not being able to take effect.

Furthermore, we found that a lot of states can be pruned after a
call instruction, f.e. we were able to reduce the search state by
~35% in some cases with this heuristic, trade-off is to keep a bit
more states in env->explored_states. Usually, call instructions
have a number of preceding register assignments and/or stack stores,
where search pruning has a better chance to suceed in states_equal()
test. The current code marks the branch targets with STATE_LIST_MARK
in case of conditional jumps, and the next (t + 1) instruction in
case of unconditional jump so that f.e. a backjump will walk it. We
also did experiments with using t + insns[t].off + 1 as a marker in
the unconditionally jump case instead of t + 1 with the rationale
that these two branches of execution that converge after the label
might have more potential of pruning. We found that it was a bit
better, but not necessarily significantly better than the current
state, perhaps also due to clang not generating back jumps often.
Hence, we left that as is for now.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: share user_ptr pointer for both devlink and devlink_port

Ptr to devlink structure can be easily obtained from
devlink_port->devlink. So share user_ptr[0] pointer for both and leave
user_ptr[1] free for other users.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-next'

Jiri Pirko says:

====================
mlxsw: small driver update + one tiny devlink dependency

Cosmetics, in preparation to sharedbuffer patchset.
First patch is here to allow patch number two.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Fix SBPM register name

Fix copy&paste error and state the name of SBPM register correctly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM

Same field, same values, so share the same enum.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Do not pass around driver_priv directly

Instead of that, pass mlxsw_core and use a helper to get driver priv
from driver code. Looks much cleaner that way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*

Instead of passing around driver priv, pass struct mlxsw_core *
directly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Move devlink port registration into common core code

Remove devlink port reg/unreg from spectrum and switchx2 code and rather
do the common work in core. That also ensures code separation where
devlink is only used in core.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: remove implicit type set in port register

As we rely on caller zeroing or correctly set the struct before the call,
this implicit type set is either no-op (DEVLINK_PORT_TYPE_NOTSET is 0)
or it rewrites wanted value. So remove this.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-mtu-buffer-reconfig'

Jakub Kicinski says:

====================
MTU/buffer reconfig changes

I re-discussed MPLS/MTU internally, dropped it from the patch 1,
re-tested everything, found out I forgot about debugfs pointers,
fixed that as well.

v5:
- don't reserve space in RX buffers for MPLS label stack
   (patch 1);
- fix debugfs pointers to ring structures (patch 5).
v4:
- cut down on unrelated patches;
- don't "close" the device on error path.

--- v4 cover letter

Previous series included some not entirely related patches,
this one is cut down.  Main issue I'm trying to solve here
is that .ndo_change_mtu() in nfpvf driver is doing full
close/open to reallocate buffers - which if open fails
can result in device being basically closed even though
the interface is started.  As suggested by you I try to move
towards a paradigm where the resources are allocated first
and the MTU change is only done once I'm certain (almost)
nothing can fail.  Almost because I need to communicate
with FW and that can always time out.

Patch 1 fixes small issue.  Next 10 patches reorganize things
so that I can easily allocate new rings and sets of buffers
while the device is running.  Patches 13 and 15 reshape the
.ndo_change_mtu() and ethtool's ring-resize operation into
desired form.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allow ring size reconfiguration at runtime

Since much of the required changes have already been made for
changing MTU at runtime let's use it for ring size changes as
well.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: pass ring count as function parameter

Soon ring resize will call this functions with values
different than the current configuration we need to
explicitly pass the ring count as parameter.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: convert .ndo_change_mtu() to prepare/commit paradigm

When changing MTU on running device first allocate new rings
and buffers and once it succeeds proceed with changing MTU.

Allocation of new rings is not really necessary for this
operation - it's done to keep the code simple and because
size of the extra ring memory is quite small compared to
the size of buffers.

Operation can still fail midway through if FW communication
times out. In that case we retry with old MTU (rings).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: propagate list buffer size in struct rx_ring

Free list buffer size needs to be propagated to few functions
as a parameter and added to struct nfp_net_rx_ring since soon
some of the functions will be reused to manage rings with
buffers of size different than nn->fl_bufsz.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: sync ring state during FW reconfiguration

FW reconfiguration in .ndo_open()/.ndo_stop() should reset/
restore queue state. Since we need IRQs to be disabled when
filling rings on RX path we have to move disable_irq() from
.ndo_open() all the way up to IRQ allocation.

nfp_net_start_vec() becomes trivial now so it's inlined.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: slice .ndo_open() and .ndo_stop() up

Divide .ndo_open() and .ndo_stop() into logical, callable
chunks. No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move filling ring information to FW config

nfp_net_[rt]x_ring_{alloc,free} should only allocate or free
ring resources without touching the device. Move setting
parameters in the BAR to separate functions. This will make
it possible to reuse alloc/free functions to allocate new
rings while the device is running.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: preallocate RX buffers early in .ndo_open

We want the .ndo_open() to have following structure:
- allocate resources;
- configure HW/FW;
- enable the device from stack perspective.
Therefore filling RX rings needs to be moved to the beginning
of .ndo_open().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: reorganize initial filling of RX rings

Separate allocation of buffers from giving them to FW,
thanks to this it will be possible to move allocation
earlier on .ndo_open() path and reuse buffers during
runtime reconfiguration.

Similar to TX side clean up the spill of functionality
from flush to freeing the ring. Unlike on TX side,
RX ring reset does not free buffers from the ring.
Ring reset means only that FW pointers are zeroed and
buffers on the ring must be placed in [0, cnt - 1)
positions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: cleanup tx ring flush and rename to reset

Since we never used flush without freeing the ring later
the functionality of the two operations is mixed.
Rename flush to ring reset and move there all the things
which have to be done after FW ring state is cleared.
While at it do some clean-ups.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allocate ring SW structs dynamically

To be able to switch rings more easily on config changes
allocate them dynamically, separately from nfp_net structure.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: make *x_ring_init do all the init

nfp_net_[rt]x_ring_init functions used to be called from probe
path only and some of their functionality was spilled to the
call site. In order to reuse them for ring reconfiguration
we need them to do all the init.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: break up nfp_net_{alloc|free}_rings

nfp_net_{alloc|free}_rings contained strange mix of allocations
and vector initialization. Remove it, declare vector init as
a separate function and handle allocations explicitly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move link state interrupt request/free calls

We need to be able to disable the link state interrupt when
the device is brought down. We used to just free the IRQ
at the beginning of .ndo_stop(). As we now move towards
more ordered .ndo_open()/.ndo_stop() paths LSC allocation
should be placed in the "allocate resource" section.

Since the IRQ can't be freed early in .ndo_stop(), it is
disabled instead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: correct RX buffer length calculation

When calculating the RX buffer length we need to account
for up to 2 VLAN tags. Rounding up to 1k is an relic of
a distant past and can be removed. While at it also remove
trivial print statement.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2016-04-07

This series contains updates to ixgbe and ixgbevf.

This entire series (except for one patch from Alex) comes from Mark and
is mainly to add support for our new MAC (x550em_a).

So let's get Alex's patch out of the way first before we cover Mark's
many changes.  Alex does his enable bulk free in transmit cleanup for
ixgbe and ixgbevf, like his has done for all of our other drivers.

First Mark cleans up registers that were not being used, so do some
house cleaning.  Then to avoid casting lan_id and func fields, just
make them u8 since they only hold small values anyways.  Found and
fixed an issue where on read operations it could be possible to
modify locations beyond the length passed in, so change the check
to round up in the same way.  Cleaned up the interface for issuing
firmware commands to use a void * instead of a u32 * which eliminates
a number of casts.  Added support for the new MAC and provided method
pointers and use them to access IOSF-attached devices, since the
new MAC will also need a new access method.  Added support for SFPs
with an external retimer and for an SGMII backplane interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-tracepoints'

Alexei Starovoitov says:

====================
allow bpf attach to tracepoints

Hi Steven, Peter,

v1->v2: addressed Peter's comments:
- fixed wording in patch 1, added ack
- refactored 2nd patch into 3:
2/10 remove unused __perf_addr macro which frees up
an argument in perf_trace_buf_submit
3/10 split perf_trace_buf_prepare into alloc and update parts, so that bpf
programs don't have to pay performance penalty for update of struct trace_entry
which is not going to be accessed by bpf
4/10 actual addition of bpf filter to perf tracepoint handler is now trivial
and bpf prog can be used as proper filter of tracepoints

v1 cover:
last time we discussed bpf+tracepoints it was a year ago [1] and the reason
we didn't proceed with that approach was that bpf would make arguments
arg1, arg2 to trace_xx(arg1, arg2) call to be exposed to bpf program
and that was considered unnecessary extension of abi. Back then I wanted
to avoid the cost of buffer alloc and field assign part in all
of the tracepoints, but looks like when optimized the cost is acceptable.
So this new apporach doesn't expose any new abi to bpf program.
The program is looking at tracepoint fields after they were copied
by perf_trace_xx() and described in /sys/kernel/debug/tracing/events/xxx/format
We made a tool [2] that takes arguments from /sys/.../format and works as:
$ tplist.py -v random:urandom_read
    int got_bits;
    int pool_left;
    int input_left;
Then these fields can be copy-pasted into bpf program like:
struct urandom_read {
    __u64 hidden_pad;
    int got_bits;
    int pool_left;
    int input_left;
};
and the program can use it:
SEC("tracepoint/random/urandom_read")
int bpf_prog(struct urandom_read *ctx)
{
    return ctx->pool_left > 0 ? 1 : 0;
}
This way the program can access tracepoint fields faster than
equivalent bpf+kprobe program, which is the main goal of these patches.

Patch 1-4 are simple changes in perf core side, please review.
I'd like to take the whole set via net-next tree, since the rest of
the patches might conflict with other bpf work going on in net-next
and we want to avoid cross-tree merge conflicts.
Alternatively we can put patches 1-4 into both tip and net-next.

Patch 9 is an example of access to tracepoint fields from bpf prog.
Patch 10 is a micro benchmark for bpf+kprobe vs bpf+tracepoint.

Note that for actual tracing tools the user doesn't need to
run tplist.py and copy-paste fields manually. The tools do it
automatically. Like argdist tool [3] can be used as:
$ argdist -H 't:block:block_rq_complete():u32:nr_sector'
where 'nr_sector' is name of tracepoint field taken from
/sys/kernel/debug/tracing/events/block/block_rq_complete/format
and appropriate bpf program is generated on the fly.

[1] http://thread.gmane.org/gmane.linux.kernel.api/8127/focus=8165
[2] https://github.com/iovisor/bcc/blob/master/tools/tplist.py
[3] https://github.com/iovisor/bcc/blob/master/tools/argdist.py
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: add tracepoint vs kprobe performance tests

the first microbenchmark does
fd=open("/proc/self/comm");
for() {
  write(fd, "test");
}
and on 4 cpus in parallel:
                                      writes per sec
base (no tracepoints, no kprobes)         930k
with kprobe at __set_task_comm()          420k
with tracepoint at task:task_rename       730k

For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
For tracepint bpf program does nothing, since arguments are copied by tracepoint.

2nd microbenchmark does:
fd=open("/dev/urandom");
for() {
  read(fd, buf);
}
and on 4 cpus in parallel:
                                       reads per sec
base (no tracepoints, no kprobes)         300k
with kprobe at urandom_read()             279k
with tracepoint at random:urandom_read    290k

bpf progs attached to kprobe and tracepoint are noop.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: tracepoint example

modify offwaketime to work with sched/sched_switch tracepoint
instead of kprobe into finish_task_switch

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: add tracepoint support to bpf loader

Recognize "tracepoint/" section name prefix and attach the program
to that tracepoint.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: sanitize bpf tracepoint access

during bpf program loading remember the last byte of ctx access
and at the time of attaching the program to tracepoint check that
the program doesn't access bytes beyond defined in tracepoint fields

This also disallows access to __dynamic_array fields, but can be
relaxed in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs

needs two wrapper functions to fetch 'struct pt_regs *' to convert
tracepoint bpf context into kprobe bpf context to reuse existing
helper functions

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: register BPF_PROG_TYPE_TRACEPOINT program type

register tracepoint bpf program type and let it call the same set
of helper functions as BPF_PROG_TYPE_KPROBE

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf, bpf: allow bpf programs attach to tracepoints

introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached
to the perf tracepoint handler, which will copy the arguments into
the per-cpu buffer and pass it to the bpf program as its first argument.
The layout of the fields can be discovered by doing
'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
prior to the compilation of the program with exception that first 8 bytes
are reserved and not accessible to the program. This area is used to store
the pointer to 'struct pt_regs' which some of the bpf helpers will use:
+---------+
| 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
+---------+
| N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly)
+---------+
| dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
+---------+

Not that all of the fields are already dumped to user space via perf ring buffer
and broken application access it directly without consulting tracepoint/format.
Same rule applies here: static tracepoint fields should only be accessed
in a format defined in tracepoint/format. The order of fields and
field sizes are not an ABI.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: split perf_trace_buf_prepare into alloc and update parts

split allows to move expensive update of 'struct trace_entry' to later phase.
Repurpose unused 1st argument of perf_tp_event() to indicate event type.

While splitting use temp variable 'rctx' instead of '*rctx' to avoid
unnecessary loads done by the compiler due to -fno-strict-aliasing

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: remove unused __addr variable

now all calls to perf_trace_buf_submit() pass 0 as 4th
argument which will be repurposed in the next patch which will
change the meaning of 1st arg of perf_tp_event() to event_type

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: optimize perf_fetch_caller_regs

avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu init logic and
subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
so we can safely drop memset from all of the above cases and move it into
perf_ftrace_function_call that calls it with stack allocated pt_regs.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix build failure due to lockdep_sock_is_held().

Needs to be protected with CONFIG_LOCKDEP.

Based upon a patch by Hannes Frederic Sowa.

Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Bump version number

Update ixgbe version number.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add KR backplane support for x550em_a

Add support for x550em_a-based KR backplane devices.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add support for SGMII backplane interface

Add support for an SGMII backplane interface.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add support for SFPs with retimer

Add support for SFPs with an external retimer.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Introduce function to control MDIO speed

Move code that controls MDIO speed into a new function because
there will be more MACs that need the control.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Read and parse NW_MNG_IF_SEL register

Read the IXGBE_NW_MNG_IF_SEL register and use it to set interface
attributes.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Read and set instance id

Read the instance number from EEPROM and save it for later use.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use new methods for PHY access

Now x550em_a devices will use a new method for PHY access that will
get the firmware token for each access.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add support for x550em_a 10G MAC type

Add support for x550em_a 10G MAC type to the ixgbe driver. The new
MAC includes new firmware commands that need to be used to control
PHY and IOSF access, so that support is also added. The interface
supported is a native SFP+ interface.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use method pointer to access IOSF devices

Provide method pointers and use them to access IOSF-attached
devices. A new MAC will introduce a new access method.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add definitions for x550em_a 10G MAC

Add definitions for a x550em_a 10G MAC device with a native SFP
interface.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add support for single-port X550 device

Add support for a single-port X550 device.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe/ixgbevf: Add support for bulk free in Tx cleanup & cleanup boolean logic

This patch enables bulk free in Tx cleanup for ixgbevf and cleans up the
boolean logic in the polling routines for ixgbe and ixgbevf in the hopes of
avoiding any mix-ups similar to what occurred with i40e and i40evf.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Take manageability semaphore for firmware commands

We need to take the manageability semaphore when issuing firmware
commands to avoid problems. With this in place, the semaphore is
no longer taken in the ixgbe_set_fw_drv_ver_generic function, since
it will now always be taken by the ixgbe_host_interface_command
function.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Clean up interface for firmware commands

Clean up the interface for issuing firmware commands to use a
void * instead of a u32 *. This eliminates a number of casts.
Also clean up ixgbe_host_interface_command in a few other ways,
eliminating comparisons with 0, redundant parens and minor
formatting issues.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Correct length check for round up

The function ixgbe_host_interface_command actually uses a multiple
of word sized buffer to do its business, but only checks against
the actual length passed in. This means that on read operations it
could be possible to modify locations beyond the length passed in.
Change the check to round up in the same way, just to avoid any
possible hazard.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Change the lan_id and func fields to a u8 to avoid casts

Since the lan_id and func fields only ever hold small values, make
them u8 to avoid casts used to silence warnings.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Delete some unused register definitions

I noticed the SRAMREL registers are not referenced for any device,
so delete the definitions.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

sock: make lockdep_sock_is_held static inline

I forgot to add inline to lockdep_sock_is_held, so it generated all
kinds of build warnings if not build with lockdep support.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: some small fixes

When fix a minor buffer leak, and ensure that bearers filter packets
correctly while they are being shut down.

v2: Corrected typos in commit #3, as per feedback from S. Shtylyov
v3: Removed commit #3 from the series. Improved version will be
re-submitted later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: stricter filtering of packets in bearer layer

Resetting a bearer/interface, with the consequence of resetting all its
pertaining links, is not an atomic action. This becomes particularly
evident in very large clusters, where a lot of traffic may happen on the
remaining links while we are busy shutting them down. In extreme cases,
we may even see links being re-created and re-established before we are
finished with the job.

To solve this, we now introduce a solution where we temporarily detach
the bearer from the interface when the bearer is reset. This inhibits
all packet reception, while sending still is possible. For the latter,
we use the fact that the device's user pointer now is zero to filter out
which packets can be sent during this situation; i.e., outgoing RESET
messages only. This filtering serves to speed up the neighbors'
detection of the loss event, and saves us from unnecessary probing.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate buffer leak in bearer layer

When enabling a bearer we create a 'neigbor discoverer' instance by
calling the function tipc_disc_create() before the bearer is actually
registered in the list of enabled bearers. Because of this, the very
first discovery broadcast message, created by the mentioned function,
is lost, since it cannot find any valid bearer to use. Furthermore,
the used send function, tipc_bearer_xmit_skb() does not free the given
buffer when it cannot find a  bearer, resulting in the leak of exactly
one send buffer each time a bearer is enabled.

This commit fixes this problem by introducing two changes:

1) Instead of attemting to send the discovery message directly, we let
   tipc_disc_create() return the discovery buffer to the calling
   function, tipc_enable_bearer(), so that the latter can send it
   when the enabling sequence is finished.

2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
   functions at the bearer layer, we now free the indicated buffer or
   buffer chain when a valid bearer cannot be found.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gro-in-udp'

Tom Herbert says:

====================
udp: GRO in UDP sockets

This patch set adds GRO functions (gro_receive and gro_complete) to UDP
sockets and removes udp_offload infrastructure.

Add GRO functions (gro_receive and gro_complete) to UDP sockets. In
udp_gro_receive and udp_gro_complete a socket lookup is done instead of
looking up the port number in udp_offloads. If a socket is found and
there are GRO functions for it then those are called. This feature
allows binding GRO functions to more than just a port number.
Eventually, we will be able to use this technique to allow application
defined GRO for an application protocol by attaching BPF porgrams to UDP
sockets for doing GRO.

In order to implement these functions, we added exported
udp6_lib_lookup_skb and udp4_lib_lookup_skb functions in ipv4/udp.c and
ipv6/udp.c. Also, inet_iif and references to skb_dst() were changed to
check that dst is set in skbuf before derefencing. In the GRO path there
is now a UDP socket lookup performed before dst is set, to the get the
device in that case we simply use skb->dev.

Tested:

Ran various combinations of VXLAN and GUE TCP_STREAM and TCP_RR tests.
Did not see any material regression.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Remove udp_offloads

Now that the UDP encapsulation GRO functions have been moved to the UDP
socket we not longer need the udp_offload insfrastructure so removing it.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: change to use UDP socket GRO

Adapt geneve_gro_receive, geneve_gro_complete to take a socket argument.
Set these functions in tunnel_config. Don't set udp_offloads any more.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fou: change to use UDP socket GRO

Adapt gue_gro_receive, gue_gro_complete to take a socket argument.
Don't set udp_offloads any more.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: change vxlan to use UDP socket GRO

Adapt vxlan_gro_receive, vxlan_gro_complete to take a socket argument.
Set these functions in tunnel_config. Don't set udp_offloads any more.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Add socket based GRO and config

Add gro_receive and gro_complete to struct udp_tunnel_sock_cfg.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Add GRO functions to UDP socket

This patch adds GRO functions (gro_receive and gro_complete) to UDP
sockets. udp_gro_receive is changed to perform socket lookup on a
packet. If a socket is found the related GRO functions are called.

This features obsoletes using UDP offload infrastructure for GRO
(udp_offload). This has the advantage of not being limited to provide
offload on a per port basis, GRO is now applied to whatever individual
UDP sockets are bound to. This also allows the possbility of
"application defined GRO"-- that is we can attach something like
a BPF program to a UDP socket to perfrom GRO on an application
layer protocol.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb

Add externally visible functions to lookup a UDP socket by skb. This
will be used for GRO in UDP sockets. These functions also check
if skb->dst is set, and if it is not skb->dev is used to get dev_net.
This allows calling lookup functions before dst has been set on the
skbuff.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Checks skb_dst to be NULL in inet_iif

In inet_iif check if skb_rtable is NULL for the skb and return
skb->skb_iif if it is.

This change allows inet_iif to be called before the dst
information has been set in the skb (e.g. when doing socket based
UDP GRO).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sock-lockdep-tightening'

Hannes Frederic Sowa says:

====================
sock: lockdep tightening

First patch is from Eric Dumazet and improves lockdep accuracy for
socket locks. After that, second patch introduces lockdep_sock_is_held
and uses it. Final patch reverts and reworks the lockdep fix from Daniel
in the filter code, as we now have tighter lockdep support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tun: use socket locks for sk_{attach,detatch}_filter

This reverts commit 5a5abb1fa3b05dd ("tun, bpf: fix suspicious RCU usage
in tun_{attach, detach}_filter") and replaces it to use lock_sock around
sk_{attach,detach}_filter. The checks inside filter.c are updated with
lockdep_sock_is_held to check for proper socket locks.

It keeps the code cleaner by ensuring that only one lock governs the
socket filter instead of two independent locks.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce lockdep_is_held and update various places to use it

The socket is either locked if we hold the slock spin_lock for
lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned
!= 0). Check for this and at the same time improve that the current
thread/cpu is really holding the lock.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: fix lockdep annotation in release_sock

During release_sock we use callbacks to finish the processing
of outstanding skbs on the socket. We actually are still locked,
sk_locked.owned == 1, but we already told lockdep that the mutex
is released. This could lead to false positives in lockdep for
lockdep_sock_is_held (we don't hold the slock spinlock during processing
the outstanding skbs).

I took over this patch from Eric Dumazet and tested it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: fix inet_reuseport_add_sock()

David Ahern reported panics in __inet_hash() caused by my recent commit.

The reason is inet_reuseport_add_sock() was still using
sk_nulls_for_each_rcu() instead of sk_for_each_rcu().
SO_REUSEPORT enabled listeners were causing an instant crash.

While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE
flag, as it is inherited from the parent at clone time.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>