git.openwrt.org Git - openwrt/staging/blogic.git/log

i40e: Reassign incorrect PHY type to fix a FW bug

Some FW versions are incorrectly reporting a breakout cable as PHY type
0x3 when it should be 0x16 (I40E_PHY_TYPE_10GBASE_SFPP_CU).
If we get this value back from FW and the version is < 4.40, reassign it
to I40E_PHY_TYPE_10GBASE_SFPP_CU.

Change-ID: Ibb41a0e3cd2c0753744e8553959240df6ed13ae8
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix XPS mask when resetting

During resets (possibly caused by a Tx hang) the driver would
accidentally clear the XPS mask for all queues back to 0.

This caused higher CPU utilization and had some other performance impacts
for transmit tests.

Change-ID: I95f112432c9e643a153eaa31cd28cdcbfdd01831
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use more portable sign extension

Use automatic sign extension by replacing 0xffff... constants
with ~(u64)0 or ~(u32)0.

Change-ID: I73cab4cd2611795bb12e00f0f24fafaaee07457c
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: grab NVM devstarter version not image version

0x2A is the NVM version so it has useful data but it is per image
version every image can have a different one. 0x18 is the dev starter
version which all the images for release will have the same version.
Of the two 0x18 is more useful and is what should be displayed.

Change-ID: Idf493da13a42ab211e2de0bef287f5de51033cca
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Don't check operational or sync bit for App TLV

In CEE mode the firmware does not set the operational status bit of
the application TLV status as returned from the "Get CEE DCBX Oper Cfg"
AQ command. This occurs whenever a DCBX configuration is changed.

This is a workaround to remove the check for the operational and sync bits
of the application TLV status till a firmware fix is provided.

Change-ID: I1a31ff2fcadcb06feb5b55776a33593afc6ea176
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: during LED interaction ignore activity LED src modes

Modify our get and set LED functions so they ignore activity LEDs,
as we are required to blink the link LEDs only.

Change-ID: I647ea67a6fc95cbbab6e3cd01d81ec9ae096a9ad
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix NPAR Tx Scheduler init

Recent changes to the driver initialization have caused the BW
configurations to not take effect. We use a BW configuration read and
write back to "kick" the Tx scheduler into action.

Change-ID: I94ab377c58d3a3986e3de62b6c199be3fd2ee5e6
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: bcmgenet: simplify __bcmgenet_tx_reclaim()

1. Use c_index and ring->c_index to determine how many TxCBs/TxBDs are
   ready for cleanup
   - c_index = the current value of TDMA_CONS_INDEX
   - TDMA_CONS_INDEX is HW-incremented and auto-wraparound (0x0-0xFFFF)
   - ring->c_index = __bcmgenet_tx_reclaim() cleaned up to this point on
     the previous invocation

2. Add bcmgenet_tx_ring->clean_ptr
   - index of the next TxCB to be cleaned
   - incremented as TxCBs/TxBDs are processed
   - value always in range [ring->cb_ptr, ring->end_ptr]

3. Fix incrementing of dev->stats.tx_packets
   - should be incremented only when tx_cb_ptr->skb != NULL

These changes simplify __bcmgenet_tx_reclaim(). Furthermore, Tx ring size
can now be any value.

With the old code, Tx ring size had to be a power-of-2:
   num_tx_bds = ring->size;
   c_index &= (num_tx_bds - 1);
   last_c_index &= (num_tx_bds - 1);

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fib_trie-next'

Alexander Duyck says:

====================
ipv4/fib_trie: Cleanups to prepare for introduction of key vector

This patch series is meant to mostly just clean up the fib_trie to prepare
it for the introduction of the key_vector. As such there are a number of
minor clean-ups such as reformatting the tnode to match the format once the
key vector is introduced, some optimizations to drop the need for a leaf
parent pointer, and some changes to remove duplication of effort such as
the 2 look-ups that were essentially being done per node insertion.

v2: Added code to cleanup idx >> n->bits and explain unsigned long logic
Added code to prevent allocation when tnode size is larger than size_t
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Prevent allocating tnode if bits is too big for size_t

This patch adds code to prevent us from attempting to allocate a tnode with
a size larger than what can be represented by size_t.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Update last spot w/ idx >> n->bits code and explanation

This change updates the fib_table_lookup function so that it is in sync
with the fib_find_node function in terms of the explanation for the index
check based on the bits value.

I have also updated it from doing a mask to just doing a compare as I have
found that seems to provide more options to the compiler as I have seen it
turn this into a shift of the value and test under some circumstances.

In addition I addressed one minor issue in which we kept computing the key
^ n->key when checking the fib aliases. I pulled the xor out of the loop
in order to reduce the number of memory reads in the lookup. As a result
we should save a couple cycles since the xor is only done once much earlier
in the lookup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Make fib_table rcu safe

The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: move leaf and tnode to occupy the same spot in the key vector

If we are going to compact the leaf and tnode we first need to make sure
the fields are all in the same place. In that regard I am moving the leaf
pointer which represents the fib_alias hash list to occupy what is
currently the first key_vector pointer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Update insert and delete to make use of tp from find_node

This change makes it so that the insert and delete functions make use of
the tnode pointer returned in the fib_find_node call. By doing this we
will not have to rely on the parent pointer in the leaf which will be going
away soon.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Fib find node should return parent

This change makes it so that the parent pointer is returned by reference in
fib_find_node. By doing this I can use it to find the parent node when I
am performing an insertion and I don't have to look for it again in
fib_insert_node.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf

This change makes it so that leaf_walk_rcu takes a tnode and a key instead
of the trie and a leaf.

The main idea behind this is to avoid using the leaf parent pointer as that
can have additional overhead in the future as I am trying to reduce the
size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Only resize tnodes once instead of on each leaf removal in fib_table_flush

This change makes it so that we only call resize on the tnodes, instead of
from each of the leaves. By doing this we can significantly reduce the
amount of time spent resizing as we can update all of the leaves in the
tnode first before we make any determinations about resizing. As a result
we can simply free the tnode in the case that all of the leaves from a
given tnode are flushed instead of resizing with each leaf removed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-4.1-20150304' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-03-04

this is a pull request of 3 patches for net-next/master.

Aaron Wu contributes three patches for the blackfin can driver, which
cleans up the driver and makes use of more platform independent code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: rtm_mpls_policy[] can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Hi Dave, the following patch set includes three feature additions relating
to SR-IOV to be2net.

Patch 1 avoid creating a non-RSS default RXQ when FW allows it.
This prevents wasting one RXQ for each VF.

Patch 2 adds support for evenly distributing all queue & filter resources
across VFs. The FW informs the driver as to which resources are distributable.

Patch 3 implements the sriov_configure PCI method to allow runtime
enablement of VFs via sysfs.

Pls consider applying this patch-set to the net-next tree. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: implement .sriov_configure() PCI callback

This patch implements the .sriov_configure() PCI method to allow for
runtime enabling/disabling of VFs. The module param "num_vfs" is now
deprecated.
At the time of driver load the PF-pool resources are allocated to the PF.
When the user enables VFs, the resources are then re-distributed across
PFs and VFs based on the number of VFs enabled.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: re-distribute SRIOV resources allowed by FW

When SR-IOV is enabled in the adapter, the FW distributes resources
evenly across the PF and it's VFs. This is currently done only for some
resources.

This patch adds support for a new cmd that queries the FW for the list
of resources for which the distribution is allowed and distributes them
accordingly.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: avoid creating the non-RSS default RXQ if FW allows to

On BE2, BE3 and Skhawk-R chips one non-RSS (called "default") RXQ was
needed to receive non-IP traffic. Some FW versions now export a
capability called IFACE_FLAGS_DEFQ_RSS where this requirement doesn't hold.
On such FWs the driver now does not create the non-RSS default queue.
This prevents wasting one RXQ per VF.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cadence: Remove Kconfig dependency on ARCH

Remove Kconfig dependency and enable driver for
all ARCHs.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sh_eth-next'

Ben Hutchings says:

====================
sh_eth changes for net-next

Some minor new features and fixes.

These depend in part on the series I sent earlier for net, specifically
"sh_eth: WARN on access to a register not implemented in a particular
chip" depends on "sh_eth: Fix RX recovery on R-Car in case of RX ring
underrun".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Mitigate lost statistics updates

The statistics registers have write-clear behaviour, which means we
will lose any increment between the read and write. Mitigate this by
only clearing when we read a non-zero value, so we will never falsely
report a total of zero. This also saves time as we only handle
error statistics here and they won't often be incremented.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Optionally log RX and TX status for each completed descriptor

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Implement ethtool register dump operations

There are many different sets of registers implemented by the
different versions of this controller, and we can only expect this to
get more complicated in future. Limit how much ethtool needs to know
by including an explicit bitmap of which registers are included in the
dump, allowing room for future growth in the number of possible
registers.

As I don't have datasheets for all of these, I've only included
registers that are:

- defined in all 5 register type arrays, or
- used by the driver, or
- documented in the datasheet I have

Add one new capability flag so we can tell whether the RTRATE
register is implemented.

Delete the TSU_ADRL0 and TSU_ADR{H,L}31 definitions, as they weren't
used and the address table is already assumed to be contiguous.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: WARN on access to a register not implemented in a particular chip

Currently we may silently read/write a register at offset 0. Change
this to WARN and then ignore the write or read-back all-ones.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Implement multicast statistic based on the RFS8 status bit

At least on the R8A7790, RFS8 reflects the RINT8 (multicast) MAC
status flag.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfin_can: Merge header file from arch dependent location

Header file was in arch dependent location arch/blackfin/include/asm/bfin_can.h,
Now move and merge the useful contents of header file into driver code, note
the original header file is reserved for full registers set access test by other
code so it survives.

Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bfin_can: introduce ioremap to comply to archs with MMU

Blackfin was built without MMU, old driver code access the IO space by
physical address, introduce the ioremap approach to be compitable with
the common style supporting MMU enabled arch.

Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bfin_can: rewrite the blackfin style of read/write to common ones

Replace the blackfin arch dependent style of bfin_read/bfin_write with
common readw/writew

Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'basic-mpls-support'

Eric W. Biederman says:

====================
Basic MPLS support take 2

On top of my two pending neighbour table prep patches here is the mpls
support refactored to use them, and edited to not drop routes when
an interface goes down.  Additionally the addition of RTA_LLGATEWAY
has been replaced with the addtion of RTA_VIA.  RTA_VIA being an
attribute that includes the address family as well as the address
of the next hop.

MPLS is at it's heart simple and I have endeavoured to maintain that
simplicity in my implemenation.

This is an implementation of a RFC3032 forwarding engine, and basic MPLS
egress logic.  Which should make linux sufficient to be a mpls
forwarding node or to be a LSA (Label Switched Router) as it says in all
of the MPLS documents.  The ingress support will follow but it deserves
it's own discussion so I am pushing it separately.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Multicast route table change notifications

Unlike IPv4 this code notifies on all cases where mpls routes
are added or removed and it never automatically removes routes.
Avoiding both the userspace confusion that is caused by omitting
route updates and the possibility of a flood of netlink traffic
when an interface goes doew.

For now reserved labels are handled automatically and userspace
is not notified.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Netlink commands to add, remove, and dump routes

This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.

RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY.  RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to.  Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET.  For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table.  For AF_PACKET the
destination mac_address is specified in the netlink configuration.

I think raw destination mac address support with the family AF_PACKET
will prove useful.  There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor.  Further
seem to be corner cases where it can be useful.  At this point
I don't care much either way.

RTA_NEWDST specifies the destination address to forward the packet
with.  MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.

Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.

rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.

I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Functions for reading and wrinting mpls labels over netlink

Reading and writing addresses in network byte order in netlink is
traditional and I see no reason to change that. MPLS is interesting
as effectively it has variabely length addresses (the MPLS label
stack). To represent these variable length addresses in netlink
I use a valid MPLS label stack (complete with stop bit).

This achieves two things: a well defined existing format is used,
and the data can be interpreted without looking at it's length.

Not needed to look at the length to decode the variable length
network representation allows existing userspace functions
such as inet_ntop to be used without needed to change their
prototype.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Basic support for adding and removing routes

mpls_route_add and mpls_route_del implement the basic logic for adding
and removing Next Hop Label Forwarding Entries from the MPLS input
label map.  The addition and subtraction is done in a way that is
consistent with how the existing routing table in Linux are
maintained.  Thus all of the work to deal with NLM_F_APPEND,
NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE.

Cases that are not clearly defined such as changing the interpretation
of the mpls reserved labels is not allowed.

Because it seems like the right thing to do adding an MPLS route without
specifying an input label and allowing the kernel to pick a free label
table entry is supported.   The implementation is currently less than optimal
but that can be changed.

As I don't have anything else to test with only ethernet and the loopback
device are the only two device types currently supported for forwarding
MPLS over.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Add a sysctl to control the size of the mpls label table

This sysctl gives two benefits. By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets. This prevents unpleasant surprises for users.

The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.

This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Basic routing support

This change adds a new Kconfig option MPLS_ROUTING.

The core of this change is the code to look at an mpls packet received
from another machine.  Look that packet up in a routing table and
forward the packet on.

Support of MPLS over ATM is not considered or attempted here.  This
implemntation follows RFC3032 and implements the MPLS shim header that
can pass over essentially any network.

What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
label fordwarding information base (lfib) might also be valid.

Further the implemntation forwards packets as described in RFC3032.
There is no need and given the original motivation for MPLS a strong
discincentive to have a flexible label forwarding path.  In essence
the logic is the topmost label is read, looked up, removed, and
replaced by 0 or more new lables and the sent out the specified
interface to it's next hop.

Quite a few optional features are not implemented here.  Among them
are generation of ICMP errors when the TTL is exceeded or the packet
is larger than the next hop MTU (those conditions are detected and the
packets are dropped instead of generating an icmp error).  The traffic
class field is always set to 0.  The implementation focuses on IP over
MPLS and does not handle egress of other kinds of protocols.

Instead of implementing coordination with the neighbour table and
sorting out how to input next hops in a different address family (for
which there is value).  I was lazy and implemented a next hop mac
address instead.  The code is simpler and there are flavor of MPLS
such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
appropriate so a next hop by mac address would need to be implemented
at some point.

Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.

Decoding the mpls header must be done by first byeswapping a 32bit bit
endian word into the local cpu endian and then bit shifting to extract
the pieces.  There is no C bit-field that can represent a wire format
mpls header on a little endian machine as the low bits of the 20bit
label wind up in the wrong half of third byte.  Therefore internally
everything is deal with in cpu native byte order except when writing
to and reading from a packet.

For management simplicity if a label is configured to forward out
an interface that is down the packet is dropped early.  Similarly
if an network interface is removed rt_dev is updated to NULL
(so no reference is preserved) and any packets for that label
are dropped.  Keeping the label entries in the kernel allows
the kernel label table to function as the definitive source
of which labels are allocated and which are not.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: Refactor how the mpls module is built

This refactoring is needed to allow more than just mpls gso
support to be built into the mpls moddule.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'neigh-mpls-prep'

Eric W. Biederman says:

====================
Neighbour table prep for MPLS

In preparation for using the IPv4 and IPv6 neighbour tables in my mpls
code this patchset factors out ___neigh_lookup_noref from
__ipv4_neigh_lookup_noref, __ipv6_lookup_noref and neigh_lookup.
Allowing the lookup logic to be shared between the different
implementations.  At what appears to be no cost. (Aka the same assembly
is generated for ip6_finish_output2 and ip_finish_output2).

After that I add a simple function that takes an address family and an
address consults the neighbour table and sends the packet to the
appropriate location.  The address family argument decoupls callers
of neigh_xmit from the addresses families the packets are sent over.
(Aka The ipv6 module can be loaded after mpls and a previously
configured ipv6 next hop will start working).

The refactoring in ___neigh_lookup_noref may be a bit overkill but it
feels like the right thing to do.  Especially since the same code is
generated.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: Add helper function neigh_xmit

For MPLS I am building the code so that either the neighbour mac
address can be specified or we can have a next hop in ipv4 or ipv6.

The kind of next hop we have is indicated by the neighbour table
pointer. A neighbour table pointer of NULL is a link layer address.
A non-NULL neighbour table pointer indicates which neighbour table and
thus which address family the next hop address is in that we need to
look up.

The code either sends a packet directly or looks up the appropriate
neighbour table entry and sends the packet.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: Factor out ___neigh_lookup_noref

While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix bridge netlink RCU usage

When the STP timer fires, it can call br_ifinfo_notify(),
which in turn ends up in the new br_get_link_af_size().
This function is annotated to be using RTNL locking, which
clearly isn't the case here, and thus lockdep warns:

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.19.0+ #569 Not tainted
  -------------------------------
  net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage!

Fix this by doing RCU locking here.

Fixes: b7853d73e39b ("bridge: add vlan info to bridge setlink and dellink notification messages")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-03-03

This series contains updates to fm10k, i40e and i40evf.

Matthew updates the fm10k driver by cleaning up code comments and whitespace
issues.  Also modifies the tunnel length header check, to make it more robust
by calculating the inner L4 header length based on whether it is TCP or UDP.
Implemented ndo_features_check() that allows drivers to report their offload
capabilities per-skb.

Neerav updates the i40e driver to skip over priority tagging if DCB is not
enabled.  Fixes an issue where the driver is not flushing out the
DCBNL app table for applications that are not present in the local DCBX
application configuration TLVs.  Fixed i40e where, in the case of MFP
mode, the driver was returning the incorrect number of traffic classes
for partitions that are not enabled for iSCSI.  Even though the driver
was not configuring these traffic classes in the transmit scheduler for
the NIC partitions, it does use this map to setup the queue mappings.

Shannon updates i40e/i40evf to include the firmware build number in the
formatted firmware version string.

Akeem adds a safety net (by adding a 'default' case) for the possible
unmatched switch calls.

Mitch updates i40e to not automatically disable PF loopback at runtime,
now that we have the functionality to enable and disable PF loopback.  This
fix cleans up a bogus error message when removing the PF module with VFs
enabled.  Adds a extra check to make sure that the indirection table
pointer is valid before dereferencing it.

Anjali enables i40e to enable more than the max RSS qps when running in a
single TC mode for the main VSI.  It is possible to enable as many as
num_online_cpus().  Adds a firmware check to ensure that DCB is disabled for
firmware versions older than 4.33.  Updates i40e/i40evf to add missing
packet types for VXLAN offload.  Updated i40e to be able to handle varying
RSS table size for each VSI, since all VSI's do not have the same RSS table
size.

v2: Dropped previous patch #9 "i40e/i40evf: Add capability to gather VEB
    per TC stats" since the stats should be in ethtool and not debugfs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
"Three miscellaneous bugfixes, most importantly the clp->cl_revoked
  bug, which we've seen several reports of people hitting"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  sunrpc: integer underflow in rsc_parse()
  nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd
  svcrpc: fix memory leak in gssp_accept_sec_context_upcall

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) If an IPVS tunnel is created with a mixed-family destination
    address, it cannot be removed.  Fix from Alexey Andriyanov.

2) Fix module refcount underflow in netfilter's nft_compat, from Pablo
    Neira Ayuso.

3) Generic statistics infrastructure can reference variables sitting on
    a released function stack, therefore use dynamic allocation always.
    Fix from Ignacy Gawędzki.

4) skb_copy_bits() return value test is inverted in ip_check_defrag().

5) Fix network namespace exit in openvswitch, we have to release all of
    the per-net vports.  From Pravin B Shelar.

6) Fix signedness bug in CAIF's cfpkt_iterate(), from Dan Carpenter.

7) Fix rhashtable grow/shrink behavior, only expand during inserts and
    shrink during deletes.  From Daniel Borkmann.

8) Netdevice names with semicolons should never be allowed, because
    they serve as a separator.  From Matthew Thode.

9) Use {,__}set_current_state() where appropriate, from Fabian
    Frederick.

10) Revert byte queue limits support in r8169 driver, it's causing
    regressions we can't figure out.

11) tcp_should_expand_sndbuf() erroneously uses tp->packets_out to
    measure packets in flight, properly use tcp_packets_in_flight()
    instead.  From Neal Cardwell.

12) Fix accidental removal of support for bluetooth in CSR based Intel
    wireless cards.  From Marcel Holtmann.

13) We accidently added a behavioral change between native and compat
    tasks, wrt testing the MSG_CMSG_COMPAT bit.  Just ignore it if the
    user happened to set it in a native binary as that was always the
    behavior we had.  From Catalin Marinas.

14) Check genlmsg_unicast() return valud in hwsim netlink tx frame
    handling, from Bob Copeland.

15) Fix stale ->radar_required setting in mac80211 that can prevent
    starting new scans, from Eliad Peller.

16) Fix memory leak in nl80211 monitor, from Johannes Berg.

17) Fix race in TX index handling in xen-netback, from David Vrabel.

18) Don't enable interrupts in amx-xgbe driver until all software et al.
    state is ready for the interrupt handler to run.  From Thomas
    Lendacky.

19) Add missing netlink_ns_capable() checks to rtnl_newlink(), from Eric
    W Biederman.

20) The amount of header space needed in macvtap was not calculated
    properly, fix it otherwise we splat past the beginning of the
    packet.  From Eric Dumazet.

21) Fix bcmgenet TCP TX perf regression, from Jaedon Shin.

22) Don't raw initialize or mod timers, use setup_timer() and
    mod_timer() instead.  From Vaishali Thakkar.

23) Fix software maintained statistics in bcmgenet and systemport
    drivers, from Florian Fainelli.

24) DMA descriptor updates in sh_eth need proper memory barriers, from
    Ben Hutchings.

25) Don't do UDP Fragmentation Offload on RAW sockets, from Michal
    Kubecek.

26) Openvswitch's non-masked set actions aren't constructed properly
    into netlink messages, fix from Joe Stringer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  openvswitch: Fix serialization of non-masked set actions.
  gianfar: Reduce logging noise seen due to phy polling if link is down
  ibmveth: Add function to enable live MAC address changes
  net: bridge: add compile-time assert for cb struct size
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  sh_eth: Really fix padding of short frames on TX
  Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"
  sh_eth: Fix RX recovery on R-Car in case of RX ring underrun
  sh_eth: Ensure proper ordering of descriptor active bit write/read
  net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
  net/mlx4_core: Fix wrong mask and error flow for the update-qp command
  net: systemport: fix software maintained statistics
  net: bcmgenet: fix software maintained statistics
  rxrpc: don't multiply with HZ twice
  rxrpc: terminate retrans loop when sending of skb fails
  net/hsr: Fix NULL pointer dereference and refcnt bugs when deleting a HSR interface.
  net: pasemi: Use setup_timer and mod_timer
  net: stmmac: Use setup_timer and mod_timer
  net: 8390: axnet_cs: Use setup_timer and mod_timer
  net: 8390: pcnet_cs: Use setup_timer and mod_timer
  ...

l2tp: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireless: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac80211: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bluetooth: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

appletalk: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8021q: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netconsole: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Miscellanea:

Add #include <linux/etherdevice.h>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireless: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Miscellanea:

Add #include <linux/etherdevice.h> where appropriate
Use ETH_ALEN instead of 6

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: Use eth_<foo>_addr instead of memset

Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Fix dependencies in the i40e driver on configfs

Module dependencies are broken in the case where CONFIG_I40E=y and
CONFIG_CONFIGFS_FS=m. This fixes the broken dependency.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ax25: Stop using magic neighbour cache operations.

Before the ax25 stack calls dev_queue_xmit it always calls
ax25_type_trans which sets skb->protocol to ETH_P_AX25.

Which means that by looking at the protocol type it is possible to
detect IP packets that have not been munged by the ax25 stack in
ndo_start_xmit and call a function to munge them.

Rename ax25_neigh_xmit to ax25_ip_xmit and tweak the return type and
value to be appropriate for an ndo_start_xmit function.

Update all of the ax25 devices to test the protocol type for ETH_P_IP
and return ax25_ip_xmit as the first thing they do. This preserves
the existing semantics of IP packet processing, but the timing will be
a little different as the IP packets now pass through the qdisc layer
before reaching the ax25 ip packet processing.

Remove the now unnecessary ax25 neighbour table operations.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix serialization of non-masked set actions.

Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
back to regular set actions, the inner attribute length was not changed,
ie, double the length being serialized. This patch fixes the bug.

Fixes: 83d2b9b ("net: openvswitch: Support masked set actions.")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Reduce logging noise seen due to phy polling if link is down

Commit 6ce29b0e2a04 ("gianfar: Avoid unnecessary reg accesses in adjust_link()")
eliminates unnecessary calls to adjust_link for phy devices which don't support
interrupts and need polling. As part of that work, the 'new_state' local flag,
which was used to reduce logging noise on the console, was eliminated.

Unfortunately, that means that a 'Link is Down' log message will now be
issued continuously if a link is configured as UP, the link state is down,
and the associated phy requires polling. This occurs because priv->oldduplex
is -1 in this case, which always differs from phydev->duplex. In addition,
phydev->speed may also differ from priv->oldspeed. gfar_update_link_state()
is therefore called each time a phy is polled, even if the link state did not
change.

Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmveth: Add function to enable live MAC address changes

Add a function that will enable changing the MAC address
of an ibmveth interface while it is still running.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/atm/signaling.c: remove WAIT_FOR_DEMON code

WAIT_FOR_DEMON code is directly undefined at the beginning
of signaling.c since initial git version and thus never compiled.
This also removes buggy current->state direct access.

Suggested-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: add compile-time assert for cb struct size

make build fail if structure no longer fits into ->cb storage.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.0-rc2

drm/i915: Fix modeset state confusion in the load detect code

This is a tricky story of the new atomic state handling and the legacy
code fighting over each another. The bug at hand is an underrun of the
framebuffer reference with subsequent hilarity caused by the load
detect code. Which is peculiar since the the exact same code works
fine as the implementation of the legacy setcrtc ioctl.

Let's look at the ingredients:

- Currently our code is a crazy mix of legacy modeset interfaces to
  set the parameters and half-baked atomic state tracking underneath.
  While this transition is going we're using the transitional plane
  helpers to update the atomic side (drm_plane_helper_disable/update
  and friends), i.e. plane->state->fb. Since the state structure owns
  the fb those functions take care of that themselves.

  The legacy state (specifically crtc->primary->fb) is still managed
  by the old code (and mostly by the drm core), with the fb reference
  counting done by callers (core drm for the ioctl or the i915 load
  detect code). The relevant commit is

  commit ea2c67bb4affa84080c616920f3899f123786e56
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Dec 23 10:41:52 2014 -0800

      drm/i915: Move to atomic plane helpers (v9)

- drm_plane_helper_disable has special code to handle multiple calls
  in a row - it checks plane->crtc == NULL and bails out. This is to
  match the proper atomic implementation which needs the crtc to get
  at the implied locking context atomic updates always need. See

  commit acf24a395c5a9290189b080383564437101d411c
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jul 29 15:33:05 2014 +0200

      drm/plane-helper: transitional atomic plane helpers

- The universal plane code split out the implicit primary plane from
  the CRTC into it's own full-blown drm_plane object. As part of that
  the setcrtc ioctl (which updated both the crtc mode and primary
  plane) learned to set crtc->primary->crtc on modeset to make sure
  the plane->crtc assignments statate up to date in

  commit e13161af80c185ecd8dc4641d0f5df58f9e3e0af
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Apr 1 15:22:38 2014 -0700

      drm: Add drm_crtc_init_with_planes() (v2)

  Unfortunately we've forgotten to update the load detect code. Which
  wasn't a problem since the load detect modeset is temporary and
  always undone before we drop the locks.

- Finally there is a organically grown history (i.e. don't ask) around
  who sets the legacy plane->fb for the various driver entry points.
  Originally updating that was the drivers duty, but for almost all
  places we've moved that (plus updating the refcounts) into the core.
  Again the exception is the load detect code.

Taking all together the following happens:
- The load detect code doesn't set crtc->primary->crtc. This is only
  really an issue on crtcs never before used or when userspace
  explicitly disabled the primary plane.

- The plane helper glue code short-circuits because of that and leaves
  a non-NULL fb behind in plane->state->fb and plane->fb. The state
  fb isn't a real problem (it's properly refcounted on its own), it's
  just the canary.

- Load detect code drops the reference for that fb, but doesn't set
  plane->fb = NULL. This is ok since it's still living in that old
  world where drivers had to clear the pointer but the core/callers
  handled the refcounting.

- On the next modeset the drm core notices plane->fb and takes care of
  refcounting it properly by doing another unref. This drops the
  refcount to zero, leaving state->plane now pointing at freed memory.

- intel_plane_duplicate_state still assume it owns a reference to that
  very state->fb and bad things start to happen.

Fix this all by applying the same duct-tape as for the legacy setcrtc
ioctl code and set crtc->primary->crtc properly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i40e/i40evf: Bump versions

Bump i40e to 1.2.10 and i40evf to 1.2.4

Change-ID: I48aa64df05fcc8356e7026f3a9e69ecf78d0c785
Signed-off-by: Sravanthi Tangeda <sravanthi.tangeda@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Only enable TC0 for NIC partition type

In case of MFP mode the driver was returning incorrect number of TCs
for partitions that are not enabled for iSCSI. Though the driver does
not configure these TCs in the Tx scheduler for the NIC partitions;
it does use this map to setup the queue mappings.

This patch fixes this and keeps all the NIC partitions to the default
PF TC i.e. TC0.

Change-ID: Iede214c907e7bac1356e999049b9f642759512b3
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Register DCBNL ops in MFP mode

Allow DCBNL operations in MFP mode to allow query of port DCB settings
via all the PFs and register iSCSI APP on iSCSI enabled PF.

Change-ID: I34cc39b4665d0a631847d4079350d3814f02381e
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: ethtool RSS fixes

Add an extra check to make sure that the indirection table pointer is
valid before dereferencing it.

Change-ID: I698adbf3daff03081d01f489dc95a9f1ad8b12f1
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix RSS size at init since default num queue calculation has changed

With changes to default number of queue pairs that the interface comes up with
from 1 per online CPU to 1 per lan_msix, we need to make sure we recalculate
rss_size. We will now recalculate rss_size based on number of queues enabled in
the VSI.

Without this fix if the max_lan_msix < num_online_cpu we will be coming up
with fewer queues but will be populating rss_size based on num_online_cpus.
This will result in packets getting silently dropped because RSS LUT has queues
that are not enabled.

Change-ID: Ifac8796ce1be1758bb0c34f38dbf4a3a76621e76
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Move RSS table size for VSIs to the VSI struct

Since all VSIs don't have the same RSS table size,
have one for each VSI instead of having a single define
for RSS table size

Change-ID: Ic2c7c66e4a389d4b6c8841a707510a9735041f02
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Add missing packet types for VXLAN encapsulated packet types

We were missing a few packet types for VXLAN offload. This patch fixes
that.

Change-ID: I4b23aa0b08e40ed49d0df6c49a5ed9f2009b44ce
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix issue with removal of apps from DBCNL app table

This patch fixes an issue where the driver is not flushing out the
DCBNL app table for applications that are not present in the local
DCBX application configuration TLVs.

Change-ID: I1f1ee04c81c145071b2ab15657546eb10b81fadb
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add FW check to disable DCB and wrap autoneg workaround with FW check

For FW < 4.33 DCB should be disabled.
Also Autoneg workaround to avoid Rx stall is still needed for FW < 4.33.

Change-ID: Iff36ad86be2f597e7701096014d6d094332a9a21
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Enable more than 64 qps for the Main VSI

When running in a single TC mode the HW can be configured to enable more
than max RSS qps for the Main VSI. This patch makes it possible to
enable as many as num_online_cpus().

ethtool -L can still be used to reconfigure number of qps
to a smaller value.

Change-ID: I3e2df085276982603d86dfd79477c0ada8d30b8f
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: don't disable PF LB when disabling VFs

Since we now have functionality to enable and disable PF loopback at
runtime, don't try to do it automatically. Removing this call also gets
rid of a bogus error message when removing the PF module with VFs
enabled.

Change-ID: Ic38652d8a3b9498d96113bfaa5ea7bad050862e9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add safety net for switch calling

This patch adds default case to handle unmatched switch calls.

Change-ID: Icd203570a1dc5322c1038f68b98a83195e8ad28c
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: print FW build number in version string

Include the FW build number in the formatted FW version string. In order
to fit within ethtool's 32 character limit, the etrack's unused high order
bits are trimmed as is the leading 0 for the NVM version. This leaves
us with 2 character left for if/when the etrack id goes to 5 hex chars
and the NVM major number goes to 2 chars.

Change-ID: Icb004c4b9b14a2f54dd200b467fcc1d7b9297308
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Skip the priority tagging if DCB is not enabled

If DCB is not enabled priority tagging is not needed
so skip over that section.

Change-ID: Ia3f3fa07945b421259a9ca38329d6d1cbd6c6bcc
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: Resolve various spelling errors and checkpatch warnings

Fix a few silly typos in the code and checkpatch warnings in support of
general code cleanliness.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: Implement ndo_features_check

The introduction of ndo_features_check allows drivers to report their
offload capabilities per-skb. Implement this in fm10k to take advantage
of this new functionality.

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fm10k: Modify tunnel length header check when offloading

The FM10000 host interface can only support up to 184 bytes when
performing tunnel offloads. Because of this, a check was added to
prevent the driver from attempting to feed a header to the hardware too
big for it to parse. Make this check a little more robust by calculating
the inner L4 header length based on whether it is TCP or UDP.

Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

udp: only allow UFO for packets from SOCK_DGRAM sockets

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sh_eth'

Ben Hutchings says:

====================
Fixes for sh_eth #4 v2

I'm continuing review and testing of Ethernet support on the R-Car H2
chip, with help from a colleague. This series fixes a few more issues.

These are not tested on any of the other supported chips.

v2: Add note that the revert is not a pure revert.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Really fix padding of short frames on TX

My previous fix to clear padding of short frames used skb->len as the
DMA length, assuming that skb_padto() extended skb->len to include the
padding. That isn't the case; we need to use skb_put_padto() instead.

(This wasn't immediately obvious because software padding isn't
actually needed on the R-Car H2. We could make it conditional on
which chip is being driven, but it's probably not worth the effort.)

Reported-by: "Violeta Menéndez González" <violeta.menendez@codethink.co.uk>
Fixes: 612a17a54b50 ("sh_eth: Fix padding of short frames on TX")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "sh_eth: Enable Rx descriptor word 0 shift for r8a7790"

This reverts commit fd9af07c3404ac9ecbd0d859563360f51ce1ffde.

The hardware manual states that the frame error and multicast bits are
copied to bits 9:0 of RD0, not bits 25:16. I've tested that this is
true for RFS1 (CRC error), RFS3 (frame too short), RFS4 (frame too
long) and RFS8 (multicast).

Also adjust a comment to agree with this.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Fix RX recovery on R-Car in case of RX ring underrun

In case of RX ring underrun (RDE), we attempt to reset the software
descriptor pointers (dirty_rx and cur_rx) to match where the hardware
will read the next descriptor from, as that might not be the first
dirty descriptor.  This relies on reading RDFAR, but that register
doesn't exist on all supported chips - specifically, not on the R-Car
chips.  This will result in unpredictable behaviour on those chips
after an RDE.

Make this pointer reset conditional and assume that it isn't needed on
the R-Car chips.  This fix also assumes that RDFAR is never exposed at
offset 0 in the memory map - this is currently true, and a subsequent
commit will fix the ambiguity between offset 0 and no-offset in the
register offset maps.

Fixes: 79fba9f51755 ("net: sh_eth: fix the rxdesc pointer when rx ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Ensure proper ordering of descriptor active bit write/read

When submitting a DMA descriptor, the active bit must be written last.
When reading a completed DMA descriptor, the active bit must be read
first.

Add memory barriers to ensure that this ordering is maintained.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'gpio-v4.0-2' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Two GPIO fixes:

   - Fix a translation problem in of_get_named_gpiod_flags()

   - Fix a long standing container_of() mistake in the TPS65912 driver"

* tag 'gpio-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: tps65912: fix wrong container_of arguments
  gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per node

Merge branch 'fixes-for-4.0-rc2' of git://git./linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal management fixes from Eduardo Valentin:
"Specifics:

   - Several fixes in tmon tool.

   - Fixes in intel int340x for _ART and _TRT tables.

   - Add id for Avoton SoC into powerclamp driver.

   - Fixes in RCAR thermal driver to remove race conditions and fix fail
     path

   - Fixes in TI thermal driver: removal of unnecessary code and build
     fix if !CONFIG_PM_SLEEP

   - Cleanups in exynos thermal driver

   - Add stubs for include/linux/thermal.h.  Now drivers using thermal
     calls but that also work without CONFIG_THERMAL will be able to
     compile for systems that don't care about thermal.

  Note: I am sending this pull on Rui's behalf while he fixes issues in
  his Linux box"

* 'fixes-for-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: int340x_thermal: Ignore missing _ART, _TRT tables
  thermal/intel_powerclamp: add id for Avoton SoC
  tools/thermal: tmon: silence 'set but not used' warnings
  tools/thermal: tmon: use pkg-config to determine library dependencies
  tools/thermal: tmon: support cross-compiling
  tools/thermal: tmon: add .gitignore
  tools/thermal: tmon: fixup tui windowing calculations
  tools/thermal: tmon: tui: don't hard-code dialog window size assumptions
  tools/thermal: tmon: add min/max macros
  tools/thermal: tmon: add --target-temp parameter
  thermal: exynos: Clean-up code to use oneline entry for exynos compatible table
  thermal: rcar: Make error and remove paths symmetrical with init
  thermal: rcar: Fix race condition between init and interrupt
  thermal: Introduce dummy functions when thermal is not defined
  ti-soc-thermal: Delete an unnecessary check before the function call "cpufreq_cooling_unregister"
  thermal: ti-soc-thermal: bandgap: Fix build warning if !CONFIG_PM_SLEEP

Merge tag 'md/4.0-fixes' of git://neil.brown.name/md

Pull md fixes from Neil Brown:
"Three md fixes:

   - fix a read-balance problem that was reported 2 years ago, but that
     I never noticed the report :-(

   - fix for rare RAID6 problem causing incorrect bitmap updates when
     two devices fail.

   - add __ATTR_PREALLOC annotation now that it is possible"

* tag 'md/4.0-fixes' of git://neil.brown.name/md:
  md: mark some attributes as pre-alloc
  raid5: check faulty flag for array status during recovery.
  md/raid1: fix read balance when a drive is write-mostly.

Merge tag 'metag-fixes-v4.0-1' of git://git./linux/kernel/git/jhogan/metag

Pull arch/metag fix from James Hogan:
"This is just a single patch to fix the KSTK_EIP() and KSTK_ESP()
  macros for metag which have always been erronously returning the PC
  and stack pointer of the task's kernel context rather than from its
  user context saved at entry from userland into the kernel, which
  affects the contents of /proc/<pid>/maps and /proc/<pid>/stat"

* tag 'metag-fixes-v4.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag: Fix KSTK_EIP() and KSTK_ESP() macros

Merge branch 'neigh_cleanups'

Eric W. Biederman says:

====================
Neighbour table and ax25 cleanups

While looking at the neighbour table to what it would take to allow
using next hops in a different address family than the current packets
I found a partial resolution for my issues and I stumbled upon some
work that makes the neighbour table code easier to understand and
maintain.

Long ago in a much younger kernel ax25 found a hack to use
dev_rebuild_header to transmit it's packets instead of going through
what today is ndo_start_xmit.

When the neighbour table was rewritten into it's current form the ax25
code was such a challenge that arp_broken_ops appeard in arp.c and
neigh_compat_output appeared in neighbour.c to keep the ax25 hack alive.

With a little bit of work I was able to remove some of the hack that
is the ax25 transmit path for ip packets and to isolate what remains
into a slightly more readable piece of code in ax25_ip.c.  Removing the
need for the generic code to worry about ax25 special cases.

After cleaning up the old ax25 hacks I also performed a little bit of
work on neigh_resolve_output to remove the need for a dst entry and to
ensure cached headers get a deterministic protocol value in their cached
header.   This guarantees that a cached header will not be different
depending on which protocol of packet is transmitted, and it allows
packets to be transmitted that don't have a dst entry.  There remains
a small amount of code that takes advantage of when packets have a dst
entry but that is something different.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: Don't require a dst in neigh_resolve_output

Having a dst helps a little bit for teql but is fundamentally
unnecessary and there are code paths where a dst is not available that
it would be nice to use the neighbour cache.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>