git.openwrt.org Git - openwrt/staging/blogic.git/log

bgmac: allow enabling on ARCH_BCM_5301X

Home routers based on ARM SoCs like BCM4708 also have bcma bus with core
supported by bgmac.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets

On ARM SoCs with bgmac Ethernet hardware we don't have any normal PHY.
There is always a switch attached but it's not even controlled over MDIO
like in case of MIPS devices.
We need a fixed PHY to be able to send/receive packets from the switch.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-03-20

This series contains updates to ixgb, e1000e, igb and igbvf.

Eliezer and Todd provide patches to fix a potential issue found during code
inspection. When bringing down an interface netif_carrier_off() should
be one of the first things we do, since this will prevent the stack from
queueing more packets to this interface.

Yanir provides a fix for e1000e that was found in validating i219,
where the call to e1000e_write_protect_nvm_ich8lan() is no longer
supported in newer hardware. Access to these registers causes a
system freeze in early steppings and is ignored in later steppings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make NET_DSA manually selectable from the config

Change bd76a116707bd2381da36cf7c3183df11293f1d6 made all DSA drivers
depend on NET_DSA rather than selecting them. However, as the only way
to select this option was to actually select a driver, it made DSA
impossible to enable at all.

This patch adds an explicit entry which the user will have to enable
prior selecting a driver.

Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: kernel-doc cleanup on swithdev ops

Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "selinux: add a skb_owned_by() hook"

This reverts commit ca10b9e9a8ca7342ee07065289cbe74ac128c169.

No longer needed after commit eb8895debe1baba41fcb62c78a16f0c63c21662a
("tcp: tcp_make_synack() should use sock_wmalloc")

When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc

Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igbvf: use netif_carrier_off earlier when bringing if down

Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.

Reported-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: use netif_carrier_off earlier when bringing if down

Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.

Reported-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: NVM write protect access removed from SPT HW

The call to e1000e_write_protect_nvm_ich8lan() is no longer supported by HW.
Access to these registers causes a system freeze in A step hardware and is
ignored in B step hardware. This function must not be called in hardware
newer than LPT.

Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: call netif_carrier_off early on down

When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.

Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.

Move netif_carrier_off as early as possible.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgb: call netif_carrier_off early on down

When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.

Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.

Move netif_carrier_off as early as possible.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'ebpf-next'

Daniel Borkmann says:

====================
This set adds native eBPF support also to act_bpf and thus covers tc
with eBPF in the classifier *and* action part.

A link to iproute2 preview has been provided in patch 2 and the code
will be pushed out after Stephen has processed the classifier part
and helper bits for tc.

This set depends on ced585c83b27 ("act_bpf: allow non-default TC_ACT
opcodes as BPF exec outcome"), so a net into net-next merge would be
required first. Hope that's fine by you, Dave. ;)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

act_bpf: add initial eBPF support for actions

This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!

Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.

Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.

For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1].

[1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf: add sched_act_type and map it to sk_filter's verifier ops

In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.

This is bascially analogous to 96be4325f443 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").

BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.

The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.

For an initial support, it's sufficient to map it to sk_filter_ops.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
net/core/sysctl_net_core.c
net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Fix undeclared EEXIST build error on ia64

We need to include linux/errno.h in rhashtable.h since it doesn't
always get included otherwise.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom

Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2015-03-19

The following series of patches includes functional updates and changes
to the driver.

- Use the phydev->advertising field instead of the phydev->supported
  field when configuring for auto-negotiation, etc.
- Use the phy_driver flags field for setting the transceiver type
  instead of hardcoding it in the ethtool support.
- Provide an auto-negotiation timeout check
- Clarify the Tx/Rx queue information messages
- Use the new DMA memory barrier operations
- Set the device DMA mask based on what the hardware reports
- Remove the software implementation of Tx coalescing
- Fix the reporting of the Rx coalescing value
- Use napi_alloc_skb when allocating an SKB in softirq

This patch series is based on net-next.

Changes from v2:
- Use jiffies instead of timespec for the auto-negotiation timeout check
- Remove the Rx path SKB allocation re-work patch since we should only
  inline the headers and the current code guards better against any
  hardware bugs

Changes from v1:
- Default to 32-bit DMA width (minimum supported) if hardware returns
  an unexpected DMA width value
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Use napi_alloc_skb when allocating skb in softirq

Use the napi_alloc_skb function to allocate an skb when running within
the softirq context to avoid calls to local_irq_save/restore.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix Rx coalescing reporting

The Rx coalescing value is internally converted from usecs to a value
that the hardware can use. When reporting the Rx coalescing value, this
internal value is converted back to usecs. During the conversion from
and back to usecs some rounding occurs. So, for example, when setting an
Rx usec of 30, it will be reported as 29. Fix this reporting issue by
keeping the original usec value and using that during reporting.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Remove Tx coalescing

The Tx coalescing support in the driver was a software implementation
for something lacking in the hardware. Using hrtimers, the idea was to
trigger a timer interrupt after having queued a packet for transmit.
Unfortunately, as the timer value was lowered, the timer expired before
the hardware actually did the transmit and so it was racey and resulted
in unnecessary interrupts.

Remove the Tx coalescing support and hrtimer and replace with a Tx timer
that is used as a reclaim timer in case of inactivity.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Set DMA mask based on hardware register value

The hardware supplies a value that indicates the DMA range that it
is capable of using. Use this value rather than hard-coding it in
the driver.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Use the new DMA memory barriers where appropriate

Use the new lighter weight memory barriers when working with the device
descriptors.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Clarify output message about queues

Clarify that the queues referred to in a message when the device is
brought up are hardware queues and not necessarily related to the
Linux network queues.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Provide support for auto-negotiation timeout

Currently, there is no interrupt code that indicates auto-negotiation
has timed out. If the auto-negotiation has timed out then the start of
a new auto-negotiation will begin again with a new base page being
received. The state machine could be in a state that is not expecting
this interrupt code which results in an error during auto-negotiation.

Update the code to timestamp when the auto-negotiation starts. Should
another page received interrupt code occur before auto-negotiation has
completed but after the auto-negotiation timeout, then reset the state
machine to allow the auto-negotiation to continue.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Use the phy_driver flags field

Remove the setting of the transceiver type when retrieving the device
settings using ethtool and instead set the transceiver type in the
phy_driver structure flags field. Change the transceiver type to be
internal, also.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe-phy: Use phydev advertising field vs supported

With ethtool being able to control what is advertised, the advertising
field is what should be used for priming the auto-negotiation registers
and for various other checks, instead of the supported field.

Also, move the initial setting of the supported and advertising fields
into the probe function so that they are not reset each time the device
is brought up, thus allowing the user to set as desired before bringing
the device up.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour

Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b96584 (net: socket: error on a negative msg_namelen).

In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd48 (fold verify_iovec() into
copy_msghdr_from_user()).

This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().

Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rhashtable-inlined-interface'

Herbert Xu says:

====================
rhashtable: Introduce inlined interface

This series of patches introduces the inlined rhashtable interface.

The idea is to make all the function pointers visible to the compiler
by providing the rhashtable_params structure explicitly to each
inline rhashtable function.  For example, instead of doing

obj = rhashtable_lookup(ht, key);

you would now do

obj = rhashtable_lookup_fast(ht, key, params);

Where params is the same data that you would give to rhashtable_init.
In particular, within rhashtable.c itself we would simply supply
ht->p.

So to convert users over, you simply have to make params globally
accessible, e.g., by placing it in a static const variable, which
can then be used at each inlined call site, as well as by the
rhashtable_init call.

The only ticky bit is that some users (i.e., netfilter) has a
dynamic key length.  This is dealt with by using params.key_len
in the inline functions when it is non-zero, and otherwise falling
back on ht->p.key_len.

Note that I've only tested this on one compiler, gcc 4.7.2.  So
please test this with your compilers as well and make sure that
the code is actually inlined without indirect function calls.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Rip out obsolete out-of-line interface

Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Use inlined rhashtable interface

This patch converts tipc to the inlined rhashtable interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_rhashtable: Use inlined rhashtable interface

This patch converts test_rhashtable to the inlined rhashtable
interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: Convert nft_hash to inlined rhashtable

This patch converts nft_hash to the inlined rhashtable interface.

This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).

Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects. The current side-effect code can
simply be moved into the nft_hash_get.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Move namespace into hash key

Currently the name space is a de facto key because it has to match
before we find an object in the hash table. However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.

This is bad as the number of name spaces is unbounded.

This patch fixes this by using the namespace when doing the hash.

Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.

This patch also uses the new inlined rhashtable interface where
possible.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Allow hash/comparison functions to be inlined

This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable.  We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.

The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.

This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.

This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not.  It
also adds a new function obj_cmpfn to compare a key against an
object.  This means that the caller no longer needs to supply
explicit compare functions.

All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Make rhashtable_init params argument const

This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.

This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf, filter: do not convert skb->protocol to host endianess during runtime

Commit c24973957975 ("bpf: allow BPF programs access 'protocol' and 'vlan_tci'
fields") has added support for accessing protocol, vlan_present and vlan_tci
into the skb offset map.

As referenced in the below discussion, accessing skb->protocol from an eBPF
program should be converted without handling endianess.

The reason for this is that an eBPF program could simply do a check more
naturally, by f.e. testing skb->protocol == htons(ETH_P_IP), where the LLVM
compiler resolves htons() against a constant automatically during compilation
time, as opposed to an otherwise needed run time conversion.

After all, the way of programming both from a user perspective differs quite
a lot, i.e. bpf_asm ["ld proto"] versus a C subset/LLVM.

Reference: https://patchwork.ozlabs.org/patch/450819/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: invert join/leave anycast rtnl/socket locking order

Commit baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.

As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.

Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: fix possible use of uninitialized in vxlan_igmp_{join, leave}

Test robot noticed that we check the return of vxlan_igmp_join and leave
but inside them there was a path that it could be used initialized.

It's not really possible because those if() inside these igmp functions
would always match as we can't have sockets of other type in there, but
this way we keep the compiler happy.

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net'

Sathya Perla says:

====================
be2net: patch set

Hi David, this patch set includes 3 bug fixes to the be2net driver.

Patch 1 fixes a vlan isolation issue with VFs. When a VF is placed in
promiscous mode, it could receive packets belonging to any vlan, as
the PF driver grants vlan promisc capability to VFs. The PF
driver now disables the vlan promisc capability for VFs to fix this
problem.

Patch 2 fixes the call to MODIFY_EQ_DELAY FW cmd to not include more
than 8 EQs per cmd. The FW is not capable of handling more than 8 EQs
per cmd.

Patch 3 fixes an EEH error detection issue. On Power platforms,
when an EEH error occurs, the slot disconnect state is more reliably
detected via an MMIO read compared to a config read. So, the error
register reads that occur every second are now done via MMIO.

Pls apply this patch set to the "net" tree. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: use PCI MMIO read instead of config read for errors

When an EEH error occurs, the device/slot is disconnected. This condition
is more reliably detected (i.e., returns all ones) with an MMIO read rather
than a config read -- especially on power platforms.

Hence, this patch fixes EEH error detection by replacing config reads with
MMIO reads for reading the error registers. The error registers in
Skyhawk-R/BE2/BE3 are accessible both via the config space and the
PCICFG (BAR0) memory space.

Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs

Issuing this cmd for more than 8 EQs does not have the intended effect
even on BEx and Skyhawk-R.

This patch fixes this by issuing this cmd for upto 8 EQs at a time.
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Prevent VFs from enabling VLAN promiscuous mode

Currently, a PF does not restrict its VF interface from enabling vlan
promiscuous mode. This breaks vlan isolation when a vlan
(transparent tagging) is configured on a VF.

This patch fixes this problem by disabling the vlan promisc capability
for VFs.

Reported-by: Yoann Juet <veilletechno-irts@univ-nantes.fr>
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix tcp fin memory accounting

tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:

ss example output (ss -amn | grep -B1 f4294):
tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080
skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0)
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix backtracking for throw routes

for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4

A simple testcase for verification is:

ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1

Signed-off-by: Steven Barth <cyrus@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}

On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.

Cc: <netdev@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Don Fry <pcnet32@frontier.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment

Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.

ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb. This approach was suggested by Vladislav
Yasevich.

Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.")
Cc: Vladislav Yasevich <vyasevic@redhat.com>
Reported-by: Matt Grant <matt@mattgrant.net.nz>
Tested-by: Matt Grant <matt@mattgrant.net.nz>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: making the bandwidth limiter runtime settable

With the current netback, the bandwidth limiter's parameters are only
settable during vif setup time. This patch register a watch on them, and
thus makes them runtime changeable.

When the watch fires, the timer is reset. The timer's mutex is used for
fencing the change.

Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'listener_refactor_part_14'

Eric Dumazet says:

====================
inet: tcp listener refactoring part 14

OK, we have serious patches here.

We get rid of the central timer handling SYNACK rtx,
which is killing us under even medium SYN flood.

We still use the listener specific hash table.

This will be done in next round ;)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: increase sk_[max_]ack_backlog

sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.

It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.

Tested:

echo 5000000 >/proc/sys/net/core/somaxconn
echo 5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

Ran a SYNFLOOD test against a listener using listen(fd, 5000000)

myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP 4185642 4411940 304 13 1 : tunables 54 27 8 : slabdata 339380 339380 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: get rid of central tcp/dccp listener timer

One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: drop prev pointer handling in request sock

When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Round up/down min/max_size to ensure we respect limit

Round up min_size respectively round down max_size to the next power
of two to make sure we always respect the limit specified by the
user. This is required because we compare the table size against the
limit before we expand or shrink.

Also fixes a minor bug where we modified min_size in the params
provided instead of the copy stored in struct rhashtable.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:
"An update to Synaptics driver that makes it usable with the 2015
  lineup from Lenovo"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Revert "Input: synaptics - use dmax in input_mt_assign_slots"
  Input: synaptics - remove X250 from the topbuttonpad list
  Input: synaptics - remove X1 Carbon 3rd gen from the topbuttonpad list
  Input: synaptics - re-route tracksticks buttons on the Lenovo 2015 series
  Input: synaptics - remove TOPBUTTONPAD property for Lenovos 2015
  Input: synaptics - retrieve the extended capabilities in query $10
  Input: synaptics - do not retrieve the board id on old firmwares
  Input: synaptics - handle spurious release of trackstick buttons
  Input: synaptics - fix middle button on Lenovo 2015 products
  Input: synaptics - skip quirks when post-2013 dimensions
  Input: synaptics - support min/max board id in min_max_pnpid_table
  Input: synaptics - remove obsolete min/max quirk for X240
  Input: synaptics - query min dimensions for fw v8.1
  Input: synaptics - log queried and quirked dimension values
  Input: synaptics - split synaptics_resolution(), query first

Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
"This fixes bugs in zero-copy splice to the fuse device"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: explicitly set /dev/fuse file's private_data
  fuse: set stolen page uptodate
  fuse: notify: don't move pages

Merge branch 'overlayfs-next' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
"This fixes minor issues with the multi-layer update in v4.0"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: upper fs should not be R/O
  ovl: check lowerdir amount for non-upper mount
  ovl: print error message for invalid mount options

Merge tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fix from Ulf Hansson:
"MMC core: fix error path in mmc_pwrseq_simple_alloc()"

* tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc

Merge tag 'pinctrl-v4.0-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Here is a slew of pin control fixes I've accumulated for the v4.0
  kernel.  Nothing special, just driver fixes (mainly embedded Intel it
  seems) and a misunderstanding regarding the stub functions was
  reverted:

   - Fix up consumer return values on pin control stubs.
   - Four patches fixing up the interrupt handling and sleep context
     save in the Baytrail driver.
   - Make default output directions work properly in the Cherryview
     driver.
   - Fix interrupt locking in the AT91 driver.
   - Fix setting interrupt generating lines as input in the sunxi
     driver"

* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
  pinctrl: at91: move lock/unlock_as_irq calls into request/release
  pinctrl: update direction_output function of cherryview driver
  pinctrl: baytrail: Save pin context over system sleep
  pinctrl: baytrail: Rework interrupt handling
  pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
  pinctrl: baytrail: Relax GPIO request rules
  Revert "pinctrl: consumer: use correct retval for placeholder functions"

Merge tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next

Pull two arch/nios2 fixes from Ley Foon Tan:
- Remove ucontext.h from exported arch headers
- nios2: mm: do not invoke OOM killer on kernel fault OOM

* tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next:
nios2: mm: do not invoke OOM killer on kernel fault OOM
nios2: Remove ucontext.h from exported arch headers

i40e: add NVM update events to AQ clean

Quit complaining about a couple of events that we actually expect to see
during an NVM update.

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/ide

Pull IDE fix from David Miller:
"Just one fix to convert a by-hand conversion of jiffies to msecs, from
Nicholas McGuire"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
ide_tape: convert jiffies with jiffies_to_msecs

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

1) Some command cases of semtimedop() not even handled due to miscoded
    comparison on sparc64.  From Rob Gardner.

2) Due to two bugs, /proc/kcore wan't working properly on sparc.

3) Make sure fatal traps stop all running cpus, from Dave Kleikamp.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix /proc/kcore
  sparc: semtimedop() unreachable due to comparison error
  sparc: io_64.h: Replace io function-link macros
  sparc64: fatal trap should stop all cpus
  arch: sparc: kernel: starfire.c: Remove unused function
  arch: sparc: kernel: traps_64.c: Remove some unused functions

tipc: fix build issue when building without IPv6

We can't directly call ipv6_sock_mc_join() but should use the stub
instead and protect it around IS_ENABLED.

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
Add Device ID and make device ID table const

This patch series adds new device ID and makes device ID table const

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and csiostor driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/cxgb4vf/csiostor: Make PCI Device ID Tables be "const"

Make PCI Device ID Tables be "const" to move them out of the data segment and
remove a redundant check on CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN in
t4_pci_id_tbl.h to guard the contents of the include file.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add device ID for new adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-19

This wont the last 4.1 bluetooth-next pull request, but we've piled up
enough patches in less than a week that I wanted to save you from a
single huge "last-minute" pull somewhere closer to the merge window.

The main changes are:

- Simultaneous LE & BR/EDR discovery support for HW that can do it
- Complete LE OOB pairing support
- More fine-grained mgmt-command access control (normal user can now do
harmless read-only operations).
- Added RF power amplifier support in cc2520 ieee802154 driver
- Some cleanups/fixes in ieee802154 code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix packet header offset calculation in _decode_session6(), from
    Hajime Tazaki.

2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.

3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
    from Luciano Coelho.

4) iwlwifi tries to stop scans that aren't actually running, also from
    Luciano Coelho.

5) mac80211 should drop mesh frames that are not encrypted, fix from
    Bob Copeland.

6) Add new device ID to b43 wireless driver for BCM432228 chips, from
    Rafał Miłecki.

7) Fix accidental addition of members after variable sized array in
    struct tc_u_hnode, from WANG Cong.

8) Don't re-enable interrupts until after we call napi_complete() in
    ibmveth and WIZnet drivers, frm Yongbae Park.

9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.

10) If a network namespace change fails during rtnl_newlink(), we don't
    unwind the device registry properly.

11) Fix two TCP regressions, from Neal Cardwell:
  - Don't allow snd_cwnd_cnt to accumulate huge values due to missing
    test in tcp_cong_avoid_ai().
  - Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.

12) Fix performance regression in xne-netback involving push TX
    notifications, from David Vrabel.

13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
    dereference blindly.  From Willem de Bruijn.

14) Fix potential stack overflow in RDS protocol stack, from Arnd
    Bergmann.

15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
    support of VXLAN driver.  Fix from Alexey Kodanev.

16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
    Dumazet.

17) ieee80211_check_combinations() does not count interfaces correctly,
    from Andrei Otcheretianski.

18) Hardware feature determination in bxn2x driver references a piece of
    software state that actually isn't initialized yet, fix from Michal
    Schmidt.

19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
    annoation, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  Revert "net: cx82310_eth: use common match macro"
  net/mlx4_en: Set statistics bitmap at port init
  IB/mlx4: Saturate RoCE port PMA counters in case of overflow
  net/mlx4_en: Fix off-by-one in ethtool statistics display
  IB/mlx4: Verify net device validity on port change event
  act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
  Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
  inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
  ip6_tunnel: fix error code when tunnel exists
  netdevice.h: fix ndo_bridge_* comments
  bnx2x: fix encapsulation features on 57710/57711
  mac80211: ignore CSA to same channel
  nl80211: ignore HT/VHT capabilities without QoS/WMM
  mac80211: ask for ECSA IE to be considered for beacon parse CRC
  mac80211: count interfaces correctly for combination checks
  isdn: icn: use strlcpy() when parsing setup options
  rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
  caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
  bridge: reset bridge mtu after deleting an interface
  can: kvaser_usb: Fix tx queue start/stop race conditions
  ...

Merge branch 'tipc-next'

Erik Hugne says:

====================
tipc: small bugfix an support for datagram connect()

Most notable in this series is patch#3 that allows programs to associate
a tipc address with a connectionless (RDM/DGRAM) socket.

v2: Fix indent issue in patch#3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add support for connect() on dgram/rdm sockets

Following the example of ip4_datagram_connect, we store the
address in the socket structure for dgram/rdm sockets and use
that as the default destination for subsequent send() calls.
It is allowed to connect to any address types, and the behaviour
of send() will be the same as a normal sendto() with this address
provided. Binding to an AF_UNSPEC address clears the association.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: do not report -EHOSTUNREACH for failed local delivery

Since commit 1186adf7df04 ("tipc: simplify message forwarding and
rejection in socket layer") -EHOSTUNREACH is propagated back to
the sending process if we fail to deliver the message to another
socket local to the node.
This is wrong, host unreachable should only be reported when the
destination port/name does not exist in the cluster, and that
check is always done before sending the message. Also, this
introduces inconsistent sendmsg() behavior for local/remote
destinations. Errors occurring on the receiving side should not
trickle up to the sender. If message delivery fails TIPC should
either discard the packet or reject it back to the sender based
on the destination droppable option.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: remove redundant call to tipc_node_remove_conn

tipc_node_remove_conn may be called twice if shutdown() is
called on a socket that have messages in the receive queue.
Calling this function twice does no harm, but is unnecessary
and we remove the redundant call.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: fix typo in header guard

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Fixes: b6eea9ca354a ("mac802154: introduce driver-ops header")
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

fuse: explicitly set /dev/fuse file's private_data

The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.

But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.

Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.

[1] https://lkml.org/lkml/2014/12/4/939

Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc

The current error-path code (when gpiod_get_index() reports
an error) can never free pwrseq->reset_gpios[0], but might
try to tree pwrseq->reset_gpios[-1], which has unfortunate
consequences.

Signed-off-by: NeilBrown <neil@brown.name>
Fixes: 934f1f48330ed695927a51fa068dc5d673f2da19
Acked-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>

ide_tape: convert jiffies with jiffies_to_msecs

Use jiffies_to_msecs for converting jiffies as it handles all of the corner
cases reliably and also helps readability. The printk format is fixed up
as jiffies_to_msecs returns unsigned int not unsigned long.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: mlx4_en_set_tx_maxrate() can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: add ageing_time, stp_state, priority over netlink

Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix high overhead of vlan sub-device teardown.

When a networking device is taken down that has a non-trivial number
of VLAN devices configured under it, we eat a full synchronize_net()
for every such VLAN device.

This is because of the call chain:

NETDEV_DOWN notifier
--> vlan_device_event()
--> dev_change_flags()
--> __dev_change_flags()
--> __dev_close()
--> __dev_close_many()
--> dev_deactivate_many()
--> synchronize_net()

This is kind of rediculous because we already have infrastructure for
batching doing operation X to a list of net devices so that we only
incur one sync.

So make use of that by exporting dev_close_many() and adjusting it's
interfaace so that the caller can fully manage the batch list. Use
this in vlan_device_event() and all the overhead goes away.

Reported-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: add a schedule point in inet_twsk_purge()

On a large hash table, we can easily spend seconds to
walk over all entries. Add a cond_resched() to yield
cpu if necessary.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: cx82310_eth: use common match macro"

This reverts commit 11ad714b98f6d9ca0067568442afe3e70eb94845 because
it breaks cx82310_eth.

The custom USB_DEVICE_CLASS macro matches
bDeviceClass, bDeviceSubClass and bDeviceProtocol
but the common USB_DEVICE_AND_INTERFACE_INFO matches
bInterfaceClass, bInterfaceSubClass and bInterfaceProtocol instead, which are
not specified.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: add support for phys_port_name

Implement the phys_port_name operation. Port names are pulled from the
rocker hardware model in qemu and default to the qemu name + port id.
e.g.,

sw1p1: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        ether 52:54:00:12:35:01  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

where 'sw1' comes from the qemu command line -device rocker,name=sw1, and
'p1' is port 1.

Patch is adapted from Scott's phys_port_id patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add support for phys_port_name

Similar to port id allow netdevices to specify port names and export
the name via sysfs. Drivers can implement the netdevice operation to
assist udev in having sane default names for the devices using the
rule:

$ cat /etc/udev/rules.d/80-net-setup-link.rules
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="",
NAME="$attr{phys_port_name}"

Use of phys_name versus phys_id was suggested-by Jiri Pirko.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: Fix /proc/kcore

/proc/kcore investigates the "System RAM" elements in /proc/iomem to
initialize it's memory tables. Therefore we have to register them
before it tries to do so. kcore uses device_initcall() so let's
use arch_initcall() for the registry.

Also we need ARCH_PROC_KCORE_TEXT to get the virtual addresses of
the kernel image correct.

Reported-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Move socket initialization to within rtnl scope

Currently, if a multicast join operation fail, the vxlan interface will
be UP but not functional, without even a log message informing the user.

Now that we can grab socket lock after already having rntl, we don't
need to defer socket creation and multicast operations. By not deferring
we can do proper error reporting to the user through ip exit code.

This patch thus removes all deferred work that vxlan had and put it back
inline. Now the socket will only be created, bound and join multicast
group when one bring the interface up, and will undo all that as soon as
one put the interface down.

As vxlan_sock_hold() is not used after this patch, it was removed too.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4,ipv6: grab rtnl before locking the socket

There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.

We normally take coarse grained locks first but setsockopt inverted that.

So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'listen_refactor_part_13'

Eric Dumazet says:

====================
inet: tcp listener refactoring, part 13

inet_hash functions are in a bad state : Too much IPv6/IPv4 copy/pasting.

Lets refactor a bit.

Idea is that we do not want to have an equivalent of inet_csk(sk)->icsk_af_ops
for request socks in order to be able to use the right variant.

In this patch series, I started to let IPv6/IPv4 converge to common helpers.

Idea is to use ipv6_addr_set_v4mapped() even for AF_INET sockets, so that
we can test
if (sk->sk_family == AF_INET6 &&
!ipv6_addr_v4mapped(&sk->sk_v6_daddr))
to tell if we deal with an IPv6 socket, or IPv4 one, at least in slow paths.

Ideally, we could save 8 bytes per struct sock_common, if we
alias skc_daddr & skc_rcv_saddr to skc_v6_daddr[3]/skc_v6_rcv_saddr[3].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

inet: request sock should init IPv6/IPv4 addresses

In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: get rid of last __inet_hash_connect() argument

We now always call __inet_hash_nolisten(), no need to pass it
as an argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: get rid of __inet6_hash()

We can now use inet_hash() and __inet_hash() instead of private
functions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: add IPv6 support to sk_ehashfn()

Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce sk_ehashfn() helper

Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers
for all kind of sockets (full sockets, timewait and request sockets)

inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: constify net_hash_mix() and various callers

const qualifiers ease code review by making clear
which objects are not written in a function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-net'

Or Gerlitz says:

====================
mlx4 driver fixes for 4.0-rc

Just few small fixes for the 4.0 rc cycle.

The fix from Moni addresses an issue from 4.0-rc1 so we
just need it for net.

Eran's fix for off-by-one should go to 3.19.y too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Set statistics bitmap at port init

Port statistics bitmap will now be initialized at port init. Even before
starting the port, statistics are visible to the user and must be properly masked.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Saturate RoCE port PMA counters in case of overflow

For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.

Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix off-by-one in ethtool statistics display

NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when
displaying the statistics starting from tx_chksum_offload in ethtool.

Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Verify net device validity on port change event

Processing an event is done in a different context from the one when
the event was dispatched. This requires a check that the slave
net device is still valid when the event is being processed. The check is done
under the iboe lock which ensure correctness.

Fixes: a57500903093 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'txq_max_rate'

Or Gerlitz says:

====================
Add max rate TXQ attribute

Add the ability to set a max-rate limitation for TX queues.
The attribute name is maxrate and the units are Mbs, to make
it similar to the existing max-rate limitation knobs (ETS and
SRIOV ndo calls).

changes from V2:
  - added Documentation (thanks Florian and Tom)
  - rebased to latest net-next to comply with the swdev ndo removal
  - addressed more feedback from Dave on the comments style

changes from V1:
  - addressed feedback from Dave

changes from V0:
  - addressed feedback from Sergei

John Fastabend (1):
  net: Add max rate tx queue attribute
====================

Signed-off-by: David S. Miller <davem@davemloft.net>