openwrt/staging/blogic.git
10 years agoipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback
Daniel Borkmann [Sat, 20 Sep 2014 12:03:55 +0000 (14:03 +0200)]
ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback

RFC2710 (MLDv1), section 3.7. says:

  The length of a received MLD message is computed by taking the
  IPv6 Payload Length value and subtracting the length of any IPv6
  extension headers present between the IPv6 header and the MLD
  message. If that length is greater than 24 octets, that indicates
  that there are other fields present *beyond* the fields described
  above, perhaps belonging to a *future backwards-compatible* version
  of MLD. An implementation of the version of MLD specified in this
  document *MUST NOT* send an MLD message longer than 24 octets and
  MUST ignore anything past the first 24 octets of a received MLD
  message.

RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
presence of MLDv1 routers:

  In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
  operate in version 1 compatibility mode. [...] When Host
  Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
  on that interface. When Host Compatibility Mode is MLDv1, a host
  acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
  on that interface. [...]

While section 8.3.1. specifies *router* behaviour regarding presence
of MLDv1 routers:

  MLDv2 routers may be placed on a network where there is at least
  one MLDv1 router. The following requirements apply:

  If an MLDv1 router is present on the link, the Querier MUST use
  the *lowest* version of MLD present on the network. This must be
  administratively assured. Routers that desire to be compatible
  with MLDv1 MUST have a configuration option to act in MLDv1 mode;
  if an MLDv1 router is present on the link, the system administrator
  must explicitly configure all MLDv2 routers to act in MLDv1 mode.
  When in MLDv1 mode, the Querier MUST send periodic General Queries
  truncated at the Multicast Address field (i.e., 24 bytes long),
  and SHOULD also warn about receiving an MLDv2 Query (such warnings
  must be rate-limited). The Querier MUST also fill in the Maximum
  Response Delay in the Maximum Response Code field, i.e., the
  exponential algorithm described in section 5.1.3. is not used. [...]

That means that we should not get queries from different versions of
MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
and MRC == MRD (both fields are overlapping within the 24 octet range).

Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
address *listeners*:

  MLDv2 routers may be placed on a network where there are hosts
  that have not yet been upgraded to MLDv2. In order to be compatible
  with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
  mode. MLDv2 routers keep a compatibility mode per multicast address
  record. The compatibility mode of a multicast address is determined
  from the Multicast Address Compatibility Mode variable, which can be
  in one of the two following states: MLDv1 or MLDv2.

  The Multicast Address Compatibility Mode of a multicast address
  record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
  *received* for that multicast address. At the same time, the Older
  Version Host Present timer for the multicast address is set to Older
  Version Host Present Timeout seconds. The timer is re-set whenever a
  new MLDv1 Report is received for that multicast address. If the Older
  Version Host Present timer expires, the router switches back to
  Multicast Address Compatibility Mode of MLDv2 for that multicast
  address. [...]

That means, what can happen is the following scenario, that hosts can
act in MLDv1 compatibility mode when they previously have received an
MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
time, an MLDv2 router could start up and transmits MLDv2 startup query
messages while being unaware of the current operational mode.

Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
listener report, so that the router according to RFC3810, section 8.3.2.
would receive that and internally switch to MLDv1 compatibility as well.

Right now, I believe since the initial implementation of MLDv2, Linux
hosts would just silently drop such MLDv2 queries instead of replying
with an MLDv1 listener report, which would prevent a MLDv2 router going
into fallback mode (until it receives other MLDv1 queries).

Since the mapping of MRC to MRD in exactly such cases can make use of
the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
the RFC. Since encodings are the same up to 32767, assume in such a
situation this value as a hard upper limit we would clamp. We have asked
one of the RFC authors on that regard, and he mentioned that there seem
not to be any implementations that make use of that exponential algorithm
on startup messages. In any case, this patch fixes this MLD
interoperability issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: bpf: fix compiler warnings in test_bpf
Alexei Starovoitov [Fri, 19 Sep 2014 20:53:51 +0000 (13:53 -0700)]
net: bpf: fix compiler warnings in test_bpf

old gcc 4.2 used by avr32 architecture produces warnings:

lib/test_bpf.c:1741: warning: integer constant is too large for 'long' type
lib/test_bpf.c:1741: warning: integer constant is too large for 'long' type
lib/test_bpf.c: In function '__run_one':
lib/test_bpf.c:1897: warning: 'ret' may be used uninitialized in this function

silence these warnings.

Fixes: 02ab695bb37e ("net: filter: add "load 64-bit immediate" eBPF instruction")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_u32 changes to knode must appear atomic to readers
John Fastabend [Sat, 20 Sep 2014 04:50:34 +0000 (21:50 -0700)]
net: sched: cls_u32 changes to knode must appear atomic to readers

Changes to the cls_u32 classifier must appear atomic to the
readers. Before this patch if a change is requested for both
the exts and ifindex, first the ifindex is updated then the
exts with tcf_exts_change(). This opens a small window where
a reader can have a exts chain with an incorrect ifindex. This
violates the the RCU semantics.

Here we resolve this by always passing u32_set_parms() a copy
of the tc_u_knode to work on and then inserting it into the hash
table after the updates have been successfully applied.

Tested with the following short script:

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
       u32 divisor 256

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
       u32 link 1: hashkey mask ffffff00 at 12    \
       match ip src 192.168.8.0/2

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 102    \
       handle 1::10 u32 classid 1:2 ht 1:        \
       match ip src 192.168.8.0/8 match ip tos 0x0a 1e

#tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
 handle 1::10 u32 classid 1:2 ht 1:        \
 match ip src 1.1.0.0/8 match ip tos 0x0b 1e

CC: Eric Dumazet <edumazet@google.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: cls_u32: fix missed pcpu_success free_percpu
John Fastabend [Sat, 20 Sep 2014 04:50:04 +0000 (21:50 -0700)]
net: cls_u32: fix missed pcpu_success free_percpu

This fixes a missed free_percpu in the unwind code path and when
keys are destroyed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: remove the unnecessary notes for bond_xmit_broadcast()
dingtianhong [Fri, 19 Sep 2014 13:05:01 +0000 (21:05 +0800)]
bonding: remove the unnecessary notes for bond_xmit_broadcast()

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: slight optimization for bond_xmit_roundrobin()
dingtianhong [Fri, 19 Sep 2014 13:04:57 +0000 (21:04 +0800)]
bonding: slight optimization for bond_xmit_roundrobin()

When the slave is the curr_active_slave, no need to check
whether the slave is active or not, it is always active.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoudp: Need to make ip6_udp_tunnel.c have GPL license
Tom Herbert [Mon, 22 Sep 2014 18:39:44 +0000 (11:39 -0700)]
udp: Need to make ip6_udp_tunnel.c have GPL license

Unable to load various tunneling modules without this:

[   80.679049] fou: Unknown symbol udp_sock_create6 (err 0)
[   91.439939] ip6_udp_tunnel: Unknown symbol ip6_local_out (err 0)
[   91.439954] ip6_udp_tunnel: Unknown symbol __put_net (err 0)
[   91.457792] vxlan: Unknown symbol udp_sock_create6 (err 0)
[   91.457831] vxlan: Unknown symbol udp_tunnel6_xmit_skb (err 0)

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'be2net-next'
David S. Miller [Mon, 22 Sep 2014 19:01:13 +0000 (15:01 -0400)]
Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Patches 1 and 2 fix sparse warnings (static declaration needed and endian
declaration needed) introduced by the earlier patch set.

Patches 3 and 4 add 20G/40G speed reporting via ethtool for the Skyhawk-R
chip.

Patches 5 to 12 fix various style issues and checkpatch warnings in the
driver such as:
- removing unnecessary return statements in void routines
- adding needed blank lines after a declaration block
- deleting multiple blank lines
- inserting a blank line after a function/struct definition
- removing space after typecast
- fixing multiple assignments on a single line
- fixing alignment on a line wrap
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: fix alignment on line wrap
Kalesh AP [Fri, 19 Sep 2014 10:17:02 +0000 (15:47 +0530)]
be2net: fix alignment on line wrap

This patch fixes alignment whereever it doesn't match the open parenthesis
alignment.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: remove multiple assignments on a single line
Kalesh AP [Fri, 19 Sep 2014 10:17:01 +0000 (15:47 +0530)]
be2net: remove multiple assignments on a single line

This patch removes multiple assignments on a single line as warned
by checkpatch.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: remove space after typecasts
Kalesh AP [Fri, 19 Sep 2014 10:17:00 +0000 (15:47 +0530)]
be2net: remove space after typecasts

This patch removes unnecessary spaces after typecasts as per checkpatch warnings.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: remove unnecessary blank lines after an open brace
Kalesh AP [Fri, 19 Sep 2014 10:16:59 +0000 (15:46 +0530)]
be2net: remove unnecessary blank lines after an open brace

This patch fixes checkpatch warnings about blank lines after an open brace '{'.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: insert a blank line after function/struct//enum definitions
Kalesh AP [Fri, 19 Sep 2014 10:16:58 +0000 (15:46 +0530)]
be2net: insert a blank line after function/struct//enum definitions

This patch inserts a blank line after function/struct/union/enum definitions
as per checkpatch warnings.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: remove multiple blank lines
Kalesh AP [Fri, 19 Sep 2014 10:16:57 +0000 (15:46 +0530)]
be2net: remove multiple blank lines

This patch removes multiple blank lines in the driver as per checkpatch
warnings.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: add blank line after declarations
Kalesh AP [Fri, 19 Sep 2014 10:16:56 +0000 (15:46 +0530)]
be2net: add blank line after declarations

This patch fixes checkpatch warnings in be2net by adding a blank line
between declaration and code blocks.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: remove return statements for void functions
Kalesh AP [Fri, 19 Sep 2014 10:16:55 +0000 (15:46 +0530)]
be2net: remove return statements for void functions

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: add speed reporting for 20G-KR interface
Vasundhara Volam [Fri, 19 Sep 2014 10:16:54 +0000 (15:46 +0530)]
be2net: add speed reporting for 20G-KR interface

This patch adds speed reporting via ethtool for 20G KR2 interface on the
Skyhawk-R chip.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: add speed reporting for 40G/KR interface
Kalesh AP [Fri, 19 Sep 2014 10:16:53 +0000 (15:46 +0530)]
be2net: add speed reporting for 40G/KR interface

This patch adds speed reporting via ethtool for 40Gbps KR4 interface
on the Skyhawk-R chip.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: fix sparse warnings in be_cmd_req_port_type{}
Suresh Reddy [Fri, 19 Sep 2014 10:16:52 +0000 (15:46 +0530)]
be2net: fix sparse warnings in be_cmd_req_port_type{}

This patch fixes a sprase warnings regarding endian declarations introduced
by the following commit:

fixes: e36edd9 ("be2net: add ethtool "-m" option support")

Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: fix a sparse warning in be_cmd_modify_eqd()
Kalesh AP [Fri, 19 Sep 2014 10:16:51 +0000 (15:46 +0530)]
be2net: fix a sparse warning in be_cmd_modify_eqd()

This patch fixes a sparse warning about missing static declaration that was
introduced by the following commit:

fixes: 936767039cdf ("be2net: send a max of 8 EQs to be_cmd_modify_eqd() on Lancer")

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: keep original skb which only needs header checking during software GSO
Jason Wang [Fri, 19 Sep 2014 08:04:38 +0000 (16:04 +0800)]
net: keep original skb which only needs header checking during software GSO

Commit ce93718fb7cdbc064c3000ff59e4d3200bdfa744 ("net: Don't keep
around original SKB when we software segment GSO frames") frees the
original skb after software GSO even for dodgy gso skbs. This breaks
the stream throughput from untrusted sources, since only header
checking was done during software GSO instead of a true
segmentation. This patch fixes this by freeing the original gso skb
only when it was really segmented by software.

Fixes ce93718fb7cdbc064c3000ff59e4d3200bdfa744 ("net: Don't keep
around original SKB when we software segment GSO frames.")

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fec: fix code identation
Nimrod Andy [Fri, 19 Sep 2014 06:26:03 +0000 (14:26 +0800)]
net: fec: fix code identation

There have extra identation before .skb_copy_to_linear_data_offset(),
this patch just remove the identation.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'dsa-suspend'
David S. Miller [Mon, 22 Sep 2014 18:41:28 +0000 (14:41 -0400)]
Merge branch 'dsa-suspend'

Florian Fainelli says:

====================
dsa: Broadcom SF2 suspend/resume and WoL

This patch add supports for suspend/resume and configuring Wake-on-LAN
for Broadcom Starfighter 2 switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: bcm_sf2: add support for Wake-on-LAN
Florian Fainelli [Fri, 19 Sep 2014 00:31:25 +0000 (17:31 -0700)]
net: dsa: bcm_sf2: add support for Wake-on-LAN

In order for Wake-on-LAN to work properly, we query the parent network
device Wake-on-LAN features and advertise those. Similarly, when
configuring Wake-on-LAN on a per-port network interface, we make sure
that we do not accept something the master network devices does not
support.

Finally, we need to maintain a bitmask of the ports enabled for
Wake-on-LAN to prevent the suspend() callback from disabling a port that
is used for waking up the system.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: add {get, set}_wol callbacks to slave devices
Florian Fainelli [Fri, 19 Sep 2014 00:31:24 +0000 (17:31 -0700)]
net: dsa: add {get, set}_wol callbacks to slave devices

Allow switch drivers to implement per-port Wake-on-LAN getter and
setters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: bcm_sf2: add suspend/resume callbacks
Florian Fainelli [Fri, 19 Sep 2014 00:31:23 +0000 (17:31 -0700)]
net: dsa: bcm_sf2: add suspend/resume callbacks

Implement the suspend/resume callbacks for the Broadcom Starfighter 2
switch driver. Suspending the switch requires masking interrupts and
shutting down ports. Resuming the switch requires a software reset since
we do not know which power-sate we might be coming from, and re-enabling
the physical ports that are used.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: allow switch drivers to implement suspend/resume hooks
Florian Fainelli [Fri, 19 Sep 2014 00:31:22 +0000 (17:31 -0700)]
net: dsa: allow switch drivers to implement suspend/resume hooks

Add an abstraction layer to suspend/resume switch devices, doing the
following split:

- suspend/resume the slave network devices and their corresponding PHY
  devices
- suspend/resume the switch hardware using switch driver callbacks

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'qlge'
David S. Miller [Mon, 22 Sep 2014 18:35:36 +0000 (14:35 -0400)]
Merge branch 'qlge'

Harish Patil says:

====================
qlge: Fix compilation warning and update maintainers

This patch series includes the following set of patches:

- Fix the below warning message:
  qlge_main.c:1754: warning: 'lbq_desc' may be used uninitialized in this function

I have made changes according to your earlier feedback:

"Please fix this differently.  The problem is that the compiler can't see that
you've done the !length check at the top of the function, so when it later
sees the while (length > 0) loop, it doesn't know that this loop will always
execute at least once. Just change that loop to a do { } while() loop and
the compiler will be able to see everything."

- Update qlge driver maintainers list
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoUpdate qlge driver maintainers list
Harish Patil [Thu, 18 Sep 2014 21:27:25 +0000 (17:27 -0400)]
Update qlge driver maintainers list

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqlge: Fix compilation warning
Harish Patil [Thu, 18 Sep 2014 21:27:24 +0000 (17:27 -0400)]
qlge: Fix compilation warning

Fix the below warning message:
qlge_main.c:1754: warning: 'lbq_desc' may be used uninitialized in this function

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoam2150: Update nmclan_cs.c to use update PCMCIA API
Jeff Kirsher [Thu, 18 Sep 2014 09:33:41 +0000 (02:33 -0700)]
am2150: Update nmclan_cs.c to use update PCMCIA API

Resolves compile warning about use of a deprecated function call:
drivers/net/ethernet/amd/nmclan_cs.c: In function â€˜nmclan_config’:
drivers/net/ethernet/amd/nmclan_cs.c:624:3: warning: â€˜pcmcia_request_exclusive_irq’ is deprecated (declared at include/pcmcia/ds.h:213) [-Wdeprecated-declarations]
   ret = pcmcia_request_exclusive_irq(link, mace_interrupt);

Updates pcmcia_request_exclusive_irq() to pcmcia_request_irq().

CC: Roger Pao <rpao@paonet.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoudp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected
Andy Zhou [Sat, 20 Sep 2014 01:02:53 +0000 (18:02 -0700)]
udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected

Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is
selected. When IPV6 is not selected, those functions are stubbed out
in udp_tunnel.h.

==================================================================
 net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6'
     int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 In file included from net/ipv6/ip6_udp_tunnel.c:9:0:
      include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here
       static inline int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
==================================================================

Fixes: fd384412e udp_tunnel: Seperate ipv6 functions into its own file
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Fri, 19 Sep 2014 21:35:30 +0000 (17:35 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-09-18

This series contains updates to ixgbe and ixgbevf.

Ethan Zhao cleans up ixgbe and ixgbevf by removing bd_number from the
adapter struct because it is not longer useful.

Mark fixes ixgbe where if a hardware transmit timestamp is requested,
an uninitialized workqueue entry may be scheduled.  Added a check for
a PTP clock to avoid that.

Jacob provides a number of cleanups for ixgbe.  Since we may call
ixgbe_acquire_msix_vectors() prior to registering our netdevice, we
should not use the netdevice specific printk and use e_dev_warn()
instead.  Similar to how ixgbevf handles acquiring MSI-X vectors, we
can return an error code instead of relying on the flag being set.
This makes it more clear that we have failed to setup MSI-X mode and
will make it easier to consolidate MSI-X related code into a single
function.  In the case of disabling DCB, it is not an error since we
still can function, we just have to let the user know.  So use
e_dev_warn() instead of e_err().  Added warnings for other features
that are disabled when we are without MSI-X support.  Cleanup flags
that are no longer used or needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'mlx4-next'
David S. Miller [Fri, 19 Sep 2014 21:30:16 +0000 (17:30 -0400)]
Merge branch 'mlx4-next'

Or Gerlitz says:

====================
mlx4: CQE/EQE stride support

This series from Ido Shamay is intended for archs having
cache line larger then 64 bytes.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: Add mlx4_en_get_cqe helper
Ido Shamay [Thu, 18 Sep 2014 08:51:01 +0000 (11:51 +0300)]
net/mlx4_en: Add mlx4_en_get_cqe helper

This function derives the base address of the CQE from the CQE size,
and calculates the real CQE context segment in it from the factor
(this is like before). Before this change the code used the factor to
calculate the base address of the CQE as well.

The factor indicates in which segment of the cqe stride the cqe information
is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
only 32 and 64 byte strides. However, with larger strides, the factor is zero,
and so cannot be used to calculate the base of the CQE.

The helper uses the same method of CQE buffer pulling made by all other
components that reads the CQE buffer (mlx4_ib driver and libmlx4).

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Cache line EQE size support
Ido Shamay [Thu, 18 Sep 2014 08:51:00 +0000 (11:51 +0300)]
net/mlx4_core: Cache line EQE size support

Enable mlx4 interrupt handler to work with EQE stride feature,
The feature may be enabled when cache line is bigger than 64B.
The EQE size will then be the cache line size, and the context
segment resides in [0-31] offset.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Enable CQE/EQE stride support
Ido Shamay [Thu, 18 Sep 2014 08:50:59 +0000 (11:50 +0300)]
net/mlx4_core: Enable CQE/EQE stride support

This feature is intended for archs having cache line larger then 64B.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line. Until the introduction of this feature, there
were two types of CQE/EQE:

1. 32B stride and context in the [0-31] segment
2. 64B stride and context in the [32-63] segment

This feature introduces two additional types:

3. 128B stride and context in the [0-31] segment (128B cache line)
4. 256B stride and context in the [0-31] segment (256B cache line)

Modify the mlx4_core driver to query the device for the CQE/EQE cache
line stride capability and to enable that capability when the host
cache line size is larger than 64 bytes (supported cache lines are
128B and 256B).

The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
context behaviour is changed to require this change in VF drivers
running on such archs.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fix sparse warnings in SNMP_UPD_PO_STATS(_BH)
Sabrina Dubroca [Wed, 17 Sep 2014 21:23:12 +0000 (23:23 +0200)]
net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH)

ptr used to be a non __percpu pointer (result of a this_cpu_ptr
assignment, 7d720c3e4f0c4 ("percpu: add __percpu sparse annotations to
net")). Since d25398df59b56 ("net: avoid reloads in SNMP_UPD_PO_STATS"),
that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
is now __percpu.

Silence sparse warnings by preserving the original type and
annotation, and remove the out-of-date comment.

warning: incorrect type in initializer (different address spaces)
   expected unsigned long long *ptr
   got unsigned long long [noderef] <asn:3>*<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'fou-next'
David S. Miller [Fri, 19 Sep 2014 21:15:40 +0000 (17:15 -0400)]
Merge branch 'fou-next'

Tom Herbert says:

====================
net: foo-over-udp (fou)

This patch series implements foo-over-udp. The idea is that we can
encapsulate different IP protocols in UDP packets. The rationale for
this is that networking devices such as NICs and switches are usually
implemented with UDP (and TCP) specific mechanims for processing. For
instance, many switches and routers will implement a 5-tuple hash
for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
RSS (on NICs). Many NICs also only provide rudimentary checksum
offload (basic TCP and UDP packet), with foo-over-udp we may be
able to leverage these NICs to offload checksums of tunneled packets
(using checksum unnecessary conversion and eventually remote checksum
offload)

An example encapsulation of IPIP over FOU is diagrammed below. As
illustrated, the packet overhead for FOU is the 8 byte UDP header.

+------------------+
|    IPv4 hdr      |
+------------------+
|     UDP hdr      |
+------------------+
|    IPv4 hdr      |
+------------------+
|     TCP hdr      |
+------------------+
|   TCP payload    |
+------------------+

Conceptually, FOU should be able to encapsulate any IP protocol.
The FOU header (UDP hdr.) is essentially an inserted header between the
IP header and transport, so in the case of TCP or UDP encapsulation
the pseudo header would be based on the outer IP header and its length
field must not include the UDP header.

* Receive

In this patch set the RX path for FOU is implemented in a new fou
module. To enable FOU for a particular protocol, a UDP-FOU socket is
opened to the port to receive FOU packets. The socket is mapped to the
IP protocol for the packets. The XFRM mechanism used to receive
encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
UDP is removed and packet is reinjected in the stack for the
corresponding protocol associated with the socket (return -protocol
from udp_encap_rcv function).

GRO is provided with the appropriate fou_gro_receive and
fou_gro_complete. These routines need to know the encapsulation
protocol so we save that in udp_offloads structure with the port
and pass it in the napi_gro_cb structure.

* TX

This patch series implements FOU transmit encapsulation for IPIP, GRE, and
SIT. This done by some common infrastructure in ip_tunnel including an
ip_tunnel_encap to perform FOU encapsulation and common configuration
to enable FOU on IP tunnels. FOU is configured on existing tunnels and
does not create any new interfaces. The transmit and receive paths are
independent, so use of FOU may be assymetric between tunnel endpoints.

* Configuration

The fou module using netlink to configure FOU receive ports. The ip
command can be augmented with a fou subcommand to support this. e.g. to
configure FOU for IPIP on port 5555:

  ip fou add port 5555 ipproto 4

GRE, IPIP, and SIT have been modified with netlink commands to
configure use of FOU on transmit. The "ip link" command will be
augmented with an encap subcommand (for supporting various forms of
secondary encapsulation). For instance, to configure an ipip tunnel
with FOU on port 5555:

  ip link add name tun1 type ipip \
    remote 192.168.1.1 local 192.168.1.2 ttl 225 \
    encap fou encap-sport auto encap-dport 5555

* Notes
  - This patch set does not implement GSO for FOU. The UDP encapsulation
    code assumes TEB, so that will need to be reimplemented.
  - When a packet is received through FOU, the UDP header is not
    actually removed for the skbuf, pointers to transport header
    and length in the IP header are updated (like in ESP/UDP RX). A
    side effect is the IP header will now appear to have an incorrect
    checksum by an external observer (e.g. tcpdump), it will be off
    by sizeof UDP header. If necessary we could adjust the checksum
    to compensate.
  - Performance results are below. My expectation is that FOU should
    entail little overhead (clearly there is some work to do :-) ).
    Optimizing UDP socket lookup for encapsulation ports should help
    significantly.
  - I really don't expect/want devices to have special support for any
    of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
    and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
    steering is provided by commonly implemented UDP hashing. GRO/GSO
    seem fairly comparable with LRO/TSO already.

* Performance

Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
This was performed on bnx2x and I disabled TSO/GSO on sender to get
fair comparison for FOU versus non-FOU. CPU utilization is reported
for receive in TCP_STREAM.

  GRE
    IPv4, FOU, UDP checksum enabled
      TCP_STREAM
        24.85% CPU utilization
        9310.6 Mbps
      TCP_RR
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
    IPv4, FOU, UDP checksum disabled
      TCP_STREAM
        31.04% CPU utilization
        9302.22 Mbps
      TCP_RR
        94.13% CPU utilization
        154/239/419 90/95/99% latencies
        1.17555e+06 tps
    IPv4, no FOU
      TCP_STREAM
        23.13% CPU utilization
        9354.58 Mbps
      TCP_RR
        90.24% CPU utilization
        156/228/360 90/95/99% latencies
        1.18169e+06 tps

  IPIP
    FOU, UDP checksum enabled
      TCP_STREAM
        24.13% CPU utilization
        9328 Mbps
      TCP_RR
        94.23
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        29.13% CPU utilization
        9370.25 Mbps
      TCP_RR
        94.13% CPU utilization
        149/232/398 90/95/99% latencies
        1.19225e+06 tps
    No FOU
      TCP_STREAM
        10.43% CPU utilization
        5302.03 Mbps
      TCP_RR
        51.53% CPU utilization
        215/324/475 90/95/99% latencies
        864998 tps

  SIT
    FOU, UDP checksum enabled
      TCP_STREAM
        30.38% CPU utilization
        9176.76 Mbps
      TCP_RR
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        39.6% CPU utilization
        9176.57 Mbps
      TCP_RR
        97.14% CPU utilization
        167/272/548 90/95/99% latencies
        1.03203e+06 tps
    No FOU
      TCP_STREAM
        11.2% CPU utilization
        4636.05 Mbps
      TCP_RR
        59.51% CPU utilization
        232/346/489 90/95/99% latencies
        813199 tps

v2:
  - Removed encap IP tunnel ioctls, configuration is done by netlink
    only.
  - Don't export fou_create and fou_destroy, they are currently
    intended to be called within fou module only.
  - Filled on tunnel netlink structures and functions for new values.

v3:
  - Fixed change logs for some of the patches.
  - Remove inline from fou_gro_receive and fou_gro_complete, let
    compiler decide on these.

v4:
  - Don't need to cast void in fou_from_sock
  - Removed incorrest htons for port in fou_destroy
  - Some minor cleanup for readability
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agogre: Setup and TX path for gre/UDP foo-over-udp encapsulation
Tom Herbert [Wed, 17 Sep 2014 19:26:01 +0000 (12:26 -0700)]
gre: Setup and TX path for gre/UDP foo-over-udp encapsulation

Added netlink attrs to configure FOU encapsulation for GRE, netlink
handling of these flags, and properly adjust MTU for encapsulation.
ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation
Tom Herbert [Wed, 17 Sep 2014 19:26:00 +0000 (12:26 -0700)]
ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation

Add netlink handling for IP tunnel encapsulation parameters and
and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
from ip_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosit: Setup and TX path for sit/UDP foo-over-udp encapsulation
Tom Herbert [Wed, 17 Sep 2014 19:25:59 +0000 (12:25 -0700)]
sit: Setup and TX path for sit/UDP foo-over-udp encapsulation

Added netlink handling of IP tunnel encapulation paramters, properly
adjust MTU for encapsulation. Added ip_tunnel_encap call to
ipip6_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Changes to ip_tunnel to support foo-over-udp encapsulation
Tom Herbert [Wed, 17 Sep 2014 19:25:58 +0000 (12:25 -0700)]
net: Changes to ip_tunnel to support foo-over-udp encapsulation

This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofou: Add GRO support
Tom Herbert [Wed, 17 Sep 2014 19:25:57 +0000 (12:25 -0700)]
fou: Add GRO support

Implement fou_gro_receive and fou_gro_complete, and populate these
in the correponsing udp_offloads for the socket. Added ipproto to
udp_offloads and pass this from UDP to the fou GRO routine in proto
field of napi_gro_cb structure.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofou: Support for foo-over-udp RX path
Tom Herbert [Wed, 17 Sep 2014 19:25:56 +0000 (12:25 -0700)]
fou: Support for foo-over-udp RX path

This patch provides a receive path for foo-over-udp. This allows
direct encapsulation of IP protocols over UDP. The bound destination
port is used to map to an IP protocol, and the XFRM framework
(udp_encap_rcv) is used to receive encapsulated packets. Upon
reception, the encapsulation header is logically removed (pointer
to transport header is advanced) and the packet is reinjected into
the receive path with the IP protocol indicated by the mapping.

Netlink is used to configure FOU ports. The configuration information
includes the port number to bind to and the IP protocol corresponding
to that port.

This should support GRE/UDP
(http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
as will as the other IP tunneling protocols (IPIP, SIT).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Export inet_offloads and inet6_offloads
Tom Herbert [Wed, 17 Sep 2014 19:25:55 +0000 (12:25 -0700)]
net: Export inet_offloads and inet6_offloads

Want to be able to use these in foo-over-udp offloads, etc.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_u32: rcu can not be last node
John Fastabend [Wed, 17 Sep 2014 18:11:46 +0000 (11:11 -0700)]
net: sched: cls_u32: rcu can not be last node

tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
structure and pads the structure with tc_u32_key fields for each key.

 kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: use __skb_queue_head_init() where applicable
Eric Dumazet [Wed, 17 Sep 2014 15:05:05 +0000 (08:05 -0700)]
net: sched: use __skb_queue_head_init() where applicable

pfifo_fast and htb use skb lists, without needing their spinlocks.
(They instead use the standard qdisc lock)

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to be consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'bnx2x-next'
David S. Miller [Fri, 19 Sep 2014 20:31:13 +0000 (16:31 -0400)]
Merge branch 'bnx2x-next'

Yuval Mintz says:

====================
bnx2x: Support new Multi-function modes

This patch series adds support for 2 new Multi-function modes -
Unified Fabric Port [UFP] as well as nic partitioning 1.5 [NPAR1.5].

With the addition of the new multi-function modes, the series also
revises some of the storage-related multi-function macros.

[Do notice this series has several small issues with checkpatch]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: Add a fallback multi-function mode NPAR1.5
Yuval Mintz [Wed, 17 Sep 2014 13:24:38 +0000 (16:24 +0300)]
bnx2x: Add a fallback multi-function mode NPAR1.5

When using new Multi-function modes it's possible that due to incompatible
configuration management FW will fallback into an existing mode.

Notice that at the moment this fallback is exactly the same as the already
existing switch-independent multi-function mode, but we still use existing
infrastructure to hold this information [in case some small differences will
arise in the future].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: New multi-function mode: UFP
Yuval Mintz [Wed, 17 Sep 2014 13:24:37 +0000 (16:24 +0300)]
bnx2x: New multi-function mode: UFP

Add support for a new multi-function mode based on the Unified Fabric Port
system specifications.
Support includes configuration of:
  1. Outer vlan tags.
  2. Bandwidth settings.
  3. Virtual link enable/disable.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: Changes with storage & MAC macros
Dmitry Kravkov [Wed, 17 Sep 2014 13:24:36 +0000 (16:24 +0300)]
bnx2x: Changes with storage & MAC macros

Rearrange macros to query for storage-only modes in different MF environment.
Improves the readibility and maintainability of the code. E.g.:
- if (IS_MF_STORAGE_SD(bp) || IS_MF_FCOE_AFEX(bp))
+ if (IS_MF_STORAGE_ONLY(bp))

In addition, this removes the need for bnx2x_is_valid_ether_addr().

Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'fec-next'
David S. Miller [Fri, 19 Sep 2014 20:27:13 +0000 (16:27 -0400)]
Merge branch 'fec-next'

Florian Fainelli says:

====================
net: phy: Broadcom BCM7xxx PHY workaround update

This patch sets the change to of_phy_connect() that you have seen before,
this time with the full context of why it is useful and applicable here.

Due to some design decision, the internal PHY on Broadcom BCM7xxx chips
is not entirely self contained and does not report its internal revision
through MII_PHYSID2, that is left to external PHY designs.

This forces us to get the PHY revision from the GENET and SF2 switch drivers
because those two peripherals integrate such a PHY and do contain the PHY
revision in their registers.

The approach taken here is hopefully easy to extend to similar needs for
other chips/ as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: phy: bcm7xxx: utilize PHY revision in config_init
Florian Fainelli [Fri, 19 Sep 2014 20:07:56 +0000 (13:07 -0700)]
net: phy: bcm7xxx: utilize PHY revision in config_init

Now that the GENET and SF2 drivers have been updated to communicate us
what is the revision of the BCM7xxx integrated PHY, utilize that
information in the config_init() callback to call into the appropriate
workaround function based on our revision.

While at it, we also print the revision and patch level to help debug
new chips.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: bcm_sf2: communicate integrated PHY revision to PHY driver
Florian Fainelli [Fri, 19 Sep 2014 20:07:55 +0000 (13:07 -0700)]
net: dsa: bcm_sf2: communicate integrated PHY revision to PHY driver

The integrated BCM7xxx PHY contains no useful revision information
in its MII_PHYSID2 bits 3:0, that information is instead contained in
the SWITCH_REG_PHY_REVISION register.

Read this register, store its value, and return it by implementing the
dsa_switch::get_phy_flags() callback accordingly. The register layout is
already matching what the BCM7xxx PHY driver is expecting to find.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: dsa: allow switch drivers to specify phy_device::dev_flags
Florian Fainelli [Fri, 19 Sep 2014 20:07:54 +0000 (13:07 -0700)]
net: dsa: allow switch drivers to specify phy_device::dev_flags

Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: bcmgenet: communicate integrated PHY revision to PHY driver
Florian Fainelli [Fri, 19 Sep 2014 20:07:53 +0000 (13:07 -0700)]
net: bcmgenet: communicate integrated PHY revision to PHY driver

The integrated BCM7xxx PHY contains no useful revision information in
its MII_PHYSID2 bits 3:0, that information is instead contained in the
GENET hardware block.

We already read the GENET 32-bit revision register, so store the
integrated PHY revision in the driver private structure, and then
communicate this revision value to the PHY driver by overriding the
phy_flags value.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: bcmgenet: remove PHY_BRCM_100MBPS_WAR
Florian Fainelli [Fri, 19 Sep 2014 20:07:52 +0000 (13:07 -0700)]
net: bcmgenet: remove PHY_BRCM_100MBPS_WAR

Now that we have removed the need for the PHY_BRCM_100MBPS_WAR flag, we
can remove it from the GENET driver and the broadcom shared header file.
The PHY driver checks the PHY supported bitmask instead.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR
Florian Fainelli [Fri, 19 Sep 2014 20:07:51 +0000 (13:07 -0700)]
net: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR

There is no need for the PHY driver to check PHY_BRCM_100MBPS_WAR since
that is redundant with checking the PHY device supported features. Get
rid of that workaround flag.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: phy: broadcom: add helper for PHY revision and patch level
Florian Fainelli [Fri, 19 Sep 2014 20:07:50 +0000 (13:07 -0700)]
net: phy: broadcom: add helper for PHY revision and patch level

The Broadcom BCM7xxx internal PHYs do not contain any useful revision
information in the low 4-bits of their MII_PHYSID2 (MII register 3)
which could allow us to properly identify them.

As a result, we need the actual hardware block integrating these PHYs:
GENET or the SF2 switch to tell us what revision they are built with. To
assist with that, add two helper macros for fetching the the PHY
revision and patch level from the struct phy_device::dev_flags.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoof: mdio: honor flags passed to of_phy_connect
Florian Fainelli [Fri, 19 Sep 2014 20:07:49 +0000 (13:07 -0700)]
of: mdio: honor flags passed to of_phy_connect

Commit f9a8f83b04e0 ("net: phy: remove flags argument from phy_{attach,
connect, connect_direct}") removed the flags argument to the PHY library
calls to: phy_{attach,connect,connect_direct}.

Most Device Tree aware drivers call of_phy_connect() with the flag
argument set to 0, but some of them might want to set a different value
there in order for the PHY driver to key a specific behavior based on
the phy_device::phy_flags value.

Allow such drivers to set custom phy_flags as part of the
of_phy_connect() call since of_phy_connect() does start the PHY state
machine, it will call into the PHY driver config_init() callback which
is usually where a specific phy_flags value is important.

Fixes: f9a8f83b04e0 ("net: phy: remove flags argument from phy_{attach, connect, connect_direct}")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: add alloc_skb_with_frags() helper
Eric Dumazet [Wed, 17 Sep 2014 11:49:49 +0000 (04:49 -0700)]
net: add alloc_skb_with_frags() helper

Extract from sock_alloc_send_pskb() code building skb with frags,
so that we can reuse this in other contexts.

Intent is to use it from tcp_send_rcvq(), tcp_collapse(), ...

We also want to replace some skb_linearize() calls to a more reliable
strategy in pathological cases where we need to reduce number of frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp: do not fake tcp headers in tcp_send_rcvq()
Eric Dumazet [Wed, 17 Sep 2014 10:14:42 +0000 (03:14 -0700)]
tcp: do not fake tcp headers in tcp_send_rcvq()

Now we no longer rely on having tcp headers for skbs in receive queue,
tcp repair do not need to build fake ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'udp-tunnel-common'
David S. Miller [Fri, 19 Sep 2014 19:57:46 +0000 (15:57 -0400)]
Merge branch 'udp-tunnel-common'

Andy Zhou says:

====================
Refactor vxlan and l2tp to use new common UDP tunnel APIs

This patch series add a few more UDP tunnel APIs and refactoring current
UDP tunnel based protocols, vxlan and l2tp to make use of the new APIs.

The added APIs are setup_udp_tunnel_sock(), udp_tunnel_xmit_skb() and
udp_tunnel_sock_release(). Those implementation logics already exist in
current vxlan and l2tp implementation. Move them to common APIs to reduce
code duplications.

Also split udp_tunnel.c into net/ipv4/udp_tunnel.c and
net/ipv6/ip6_udp_tunnel.c to maintain proper IP protocol separation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agol2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions
Andy Zhou [Wed, 17 Sep 2014 00:31:19 +0000 (17:31 -0700)]
l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions

Simplify l2tp implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agovxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.
Andy Zhou [Wed, 17 Sep 2014 00:31:18 +0000 (17:31 -0700)]
vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.

Simplify vxlan implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoudp-tunnel: Add a few more UDP tunnel APIs
Andy Zhou [Wed, 17 Sep 2014 00:31:17 +0000 (17:31 -0700)]
udp-tunnel: Add a few more UDP tunnel APIs

Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoudp_tunnel: Seperate ipv6 functions into its own file.
Andy Zhou [Wed, 17 Sep 2014 00:31:16 +0000 (17:31 -0700)]
udp_tunnel: Seperate ipv6 functions into its own file.

Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'fec-next'
David S. Miller [Fri, 19 Sep 2014 19:36:54 +0000 (15:36 -0400)]
Merge branch 'fec-next'

Frank Li says:

====================
net: fec: add interrupt coalescence

improve error handle when parse queue number.
add interrupt coalescence feature.

Change from v2 to v3
 - add error check in fec_enet_set_coalesce
 - fix a run time warning to get clock rate in interrupt
 - fix commit message use TKT number

Change from v1 to v2
 - fix indention
 - use errata number instead of TKT
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fec: Workaround for imx6sx enet tx hang when enable three queues
Fugang Duan [Tue, 16 Sep 2014 21:18:54 +0000 (05:18 +0800)]
net: fec: Workaround for imx6sx enet tx hang when enable three queues

When enable three queues on imx6sx enet, and then do tx performance
test with iperf tool, after some time running, tx hang.

Found that:
If uDMA is running, software set TDAR may cause tx hang.
If uDMA is in idle, software set TDAR don't cause tx hang.

There is a TDAR race condition for mutliQ when the software sets TDAR
and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
This will cause the udma_tx and udma_tx_arbiter state machines to hang.
The issue exist at i.MX6SX enet IP.

So, the Workaround is checking TDAR status four time, if TDAR cleared by
hardware and then write TDAR, otherwise don't set TDAR.

The patch is only one Workaround for the issue ERR007885.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet:fec: increase DMA queue number
Fugang Duan [Tue, 16 Sep 2014 21:18:53 +0000 (05:18 +0800)]
net:fec: increase DMA queue number

when enable interrupt coalesce, 8 BD is not enough.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fec: add interrupt coalescence feature support
Fugang Duan [Tue, 16 Sep 2014 21:18:52 +0000 (05:18 +0800)]
net: fec: add interrupt coalescence feature support

i.MX6 SX support interrupt coalescence feature
By default, init the interrupt coalescing frame count threshold and
timer threshold.

Supply the ethtool interfaces as below for user tuning to improve
enet performance:
rx_max_coalesced_frames
rx_coalesce_usecs
tx_max_coalesced_frames
tx_coalesce_usecs

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fec: refine error handle of parser queue number from DT
Frank Li [Tue, 16 Sep 2014 21:18:51 +0000 (05:18 +0800)]
net: fec: refine error handle of parser queue number from DT

check tx and rx queue seperately.
fix typo, "Invalidate" and "fail".
change pr_err to pr_warn.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosparc: bpf_jit: add SKF_AD_PKTTYPE support to JIT
Alexei Starovoitov [Tue, 16 Sep 2014 19:35:35 +0000 (12:35 -0700)]
sparc: bpf_jit: add SKF_AD_PKTTYPE support to JIT

commit 233577a22089 ("net: filter: constify detection of pkt_type_offset")
allows us to implement simple PKTTYPE support in sparc JIT

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoixgbe: remove IXGBE_FLAG_MSI(X)_CAPABLE flags
Jacob Keller [Wed, 3 Sep 2014 08:13:01 +0000 (08:13 +0000)]
ixgbe: remove IXGBE_FLAG_MSI(X)_CAPABLE flags

They were not used, and we don't need them, so we shouldn't bother with
keeping values in the flags field that could be misleading.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: add warnings for other disabled features without MSI-X support
Jacob Keller [Wed, 3 Sep 2014 08:13:00 +0000 (08:13 +0000)]
ixgbe: add warnings for other disabled features without MSI-X support

When we can't get MSI-X vectors, we disable a few features which require
MSI-X vectors. Print warnings just like we do when disabling DCB.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: use e_dev_warn instead of netif_printk
Jacob Keller [Wed, 3 Sep 2014 08:12:59 +0000 (08:12 +0000)]
ixgbe: use e_dev_warn instead of netif_printk

Again, we should not be directly using netif_printk, as we have our own
error print routines that we generate. In addition, instead of using an
early return we can just use the else block of this one line if
statement.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: use e_dev_warn instead of e_err for displaying warning
Jacob Keller [Wed, 3 Sep 2014 08:12:58 +0000 (08:12 +0000)]
ixgbe: use e_dev_warn instead of e_err for displaying warning

In this case, disabling DCB is not an error. We can still function, but
we just have to let the user know. In addition, since we call this
during probe before allocating our netdevice structure, we should use
e_dev_warn instead of e_warn.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: determine vector count inside ixgbe_acquire_msix_vectors
Jacob Keller [Wed, 3 Sep 2014 08:12:57 +0000 (08:12 +0000)]
ixgbe: determine vector count inside ixgbe_acquire_msix_vectors

Our calculated v_budget doesn't matter except if we allocate MSI-X
vectors. We shouldn't need to calculate this outside of the function, so
don't. Instead, only calculate it once we attempt to acquire MSI-X
vectors. This helps collocate all of the MSI-X vector code together.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: move msix_entries allocation into ixgbe_acquire_msix_vectors
Jacob Keller [Wed, 3 Sep 2014 08:12:56 +0000 (08:12 +0000)]
ixgbe: move msix_entries allocation into ixgbe_acquire_msix_vectors

We already have to kfree this value if we fail, and this is only part of
MSI-X mode, so we should simply allocate the value where we need it.
This is cleaner, and makes it a lot more obvious why we are freeing it
inside of ixgbe_acquire_msix_vectors.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: return integer from ixgbe_acquire_msix_vectors
Jacob Keller [Wed, 3 Sep 2014 08:12:55 +0000 (08:12 +0000)]
ixgbe: return integer from ixgbe_acquire_msix_vectors

Similar to how ixgbevf handles acquiring MSI-X vectors, we can return an
error code instead of relying on the flag being set. This makes it more
clear that we have failed to setup MSI-X mode, and also will make it
easier to consolidate MSI-X related code all into the single function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: use e_dev_warn instead of netif_printk
Jacob Keller [Wed, 3 Sep 2014 08:12:54 +0000 (08:12 +0000)]
ixgbe: use e_dev_warn instead of netif_printk

The netif_printk relies on our netdevice structure to be registered
already. We may call ixgbe_acquire_msix_vectors prior to registering our
netdevice, so we should not use the netdevice specific printk.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: Do not schedule an uninitialized workqueue entry
Mark Rustad [Sat, 9 Aug 2014 07:02:09 +0000 (07:02 +0000)]
ixgbe: Do not schedule an uninitialized workqueue entry

If a hardware Tx timestamp is requested, an uninitialized
workqueue entry may be scheduled, especially on an 82598 adapter.
Add a check for a PTP clock to avoid that. Also only apply the
unlikely to the first term of the conditional. That will make the
rest of the checks be in the cold path.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbe: remove useless bd_number from adapter struct
Ethan Zhao [Tue, 29 Jul 2014 09:40:09 +0000 (09:40 +0000)]
ixgbe: remove useless bd_number from adapter struct

Because bd_number is not useful anymore, so remove it from adapter struct, or
if keep it, we have to fix the boards driven counter bug in ixgbe_remove() and
ixgbe_probe() only for trivial debug purpose -- other output is enough.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoixgbevf: remove useless bd_number from struct ixgbevf_adapter
Ethan Zhao [Tue, 29 Jul 2014 09:44:01 +0000 (09:44 +0000)]
ixgbevf: remove useless bd_number from struct ixgbevf_adapter

It is useless and buggy, just remove it.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agonet: fec: fix build error at m68k platform
Frank Li [Tue, 16 Sep 2014 18:34:18 +0000 (02:34 +0800)]
net: fec: fix build error at m68k platform

reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 4d494cdc92b3b9a0f5fb9e1560810fa27d5a0489
  make.cross ARCH=m68k  m5272c3_defconfig
  make.cross ARCH=m68k

drivers/net/ethernet/freescale/fec.h:262:0: warning: "FEC_R_DES_START" redefined
 #define FEC_R_DES_START(X) ((X == 1) ? FEC_R_DES_START_1 : \
 ^
drivers/net/ethernet/freescale/fec.h:158:0: note: this is the location of the previous definition
 #define FEC_R_DES_START  0x3d0 /* Receive descriptor ring */
 ^
drivers/net/ethernet/freescale/fec.h:265:0: warning: "FEC_X_DES_START" redefined
 #define FEC_X_DES_START(X) ((X == 1) ? FEC_X_DES_START_1 : \

...

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_cgroup need tcf_exts_init in all cases
John Fastabend [Tue, 16 Sep 2014 07:33:42 +0000 (00:33 -0700)]
net: sched: cls_cgroup need tcf_exts_init in all cases

This ensures the tcf_exts_init() is called for all cases.

Fixes: 952313bd62589cae216a57 ("net: sched: cls_cgroup use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar...
David S. Miller [Tue, 16 Sep 2014 20:21:48 +0000 (16:21 -0400)]
Merge branch 'net_next_ovs' of git://git./linux/kernel/git/pshelar/openvswitch

Pravin B Shelar says:

====================
Open vSwitch

Following patches adds recirculation and hash action to OVS.
First patch removes pointer to stack object. Next three patches
does code restructuring which is required for last patch.
Recirculation implementation is changed, according to comments from
David Miller, to avoid using recursive calls in OVS. It is using
queue to record recirc action and deferred recirc is executed at
the end of current actions execution.

v1-v2:
Changed subsystem name in subject to openvswitch
v2-v3:
Added patch to remove pkt_key pointer from skb->cb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_fw: add missing tcf_exts_init call in fw_change()
John Fastabend [Tue, 16 Sep 2014 06:31:42 +0000 (23:31 -0700)]
net: sched: cls_fw: add missing tcf_exts_init call in fw_change()

When allocating a new structure we also need to call tcf_exts_init
to initialize exts.

A follow up patch might be in order to remove some of this code
and do tcf_exts_assign(). With this we could remove the
tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
As part of the future tcf_actions RCU series this will need to be
done. For now fix the call here.

Fixes e35a8ee5993ba81fd6c0 ("net: sched: fw use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_cgroup fix possible memory leak of 'new'
John Fastabend [Tue, 16 Sep 2014 06:31:17 +0000 (23:31 -0700)]
net: sched: cls_cgroup fix possible memory leak of 'new'

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   54996b529ab70ca1d6f40677cd2698c4f7127e87
commit: c7953ef23042b7c4fc2be5ecdd216aacff6df5eb [625/646] net: sched: cls_cgroup use RCU

net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new'

Fixes: c7953ef23042b7c4fc2be5ecdd216aac ("net: sched: cls_cgroup use RCU")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: cls_u32 add missing rcu_assign_pointer and annotation
John Fastabend [Tue, 16 Sep 2014 06:30:49 +0000 (23:30 -0700)]
net: sched: cls_u32 add missing rcu_assign_pointer and annotation

Add missing rcu_assign_pointer and missing  annotation for ht_up
in cls_u32.c

Caught by kbuild bot,

>> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
   net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
   net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
   net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
   net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
>> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sched: fix unsued cpu variable
John Fastabend [Tue, 16 Sep 2014 06:30:26 +0000 (23:30 -0700)]
net: sched: fix unsued cpu variable

kbuild test robot reported an unused variable cpu in cls_u32.c
after the patch below. This happens when PERF and MARK config
variables are disabled

Fix this is to use separate variables for perf and mark
and define the cpu variable inside the ifdef logic.

Fixes: 459d5f626da7 ("net: sched: make cls_u32 per cpu")'
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: fix a null pointer dereference in tcindex_set_parms()
WANG Cong [Mon, 15 Sep 2014 23:43:43 +0000 (16:43 -0700)]
net_sched: fix a null pointer dereference in tcindex_set_parms()

This patch fixes the following crash:

[   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
[   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
[   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
[   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
[   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
[   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
[   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
[   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
[   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
[   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
[   42.200027] Stack:
[   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
[   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
[   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
[   42.200027] Call Trace:
[   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
[   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
[   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
[   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
[   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
[   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
[   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
[   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
[   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
[   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
[   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
[   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
[   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
[   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
[   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
[   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
[   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
[   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
[   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
[   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b

'p->h' could be NULL while 'cp->h' is always update to date.

Fixes: commit 331b72922c5f58d48fd ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: fix memory leak in cls_tcindex
WANG Cong [Mon, 15 Sep 2014 23:43:42 +0000 (16:43 -0700)]
net_sched: fix memory leak in cls_tcindex

Fixes: commit 331b72922c5f58d48fd ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoopenvswitch: Add recirc and hash action.
Andy Zhou [Tue, 16 Sep 2014 02:37:25 +0000 (19:37 -0700)]
openvswitch: Add recirc and hash action.

Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
          Action: output port_bond_a

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
10 years agoopenvswitch: simplify sample action implementation
Andy Zhou [Tue, 16 Sep 2014 02:33:50 +0000 (19:33 -0700)]
openvswitch: simplify sample action implementation

The current sample() function implementation is more complicated
than necessary in handling single user space action optimization
and skb reference counting. There is no functional changes.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
10 years agoopenvswitch: Use tun_key only for egress tunnel path.
Pravin B Shelar [Tue, 16 Sep 2014 02:28:44 +0000 (19:28 -0700)]
openvswitch: Use tun_key only for egress tunnel path.

Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion.  Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
10 years agoopenvswitch: refactor ovs flow extract API.
Pravin B Shelar [Tue, 16 Sep 2014 02:20:31 +0000 (19:20 -0700)]
openvswitch: refactor ovs flow extract API.

OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
10 years agoopenvswitch: Remove pkt_key from OVS_CB
Pravin B Shelar [Tue, 16 Sep 2014 02:15:28 +0000 (19:15 -0700)]
openvswitch: Remove pkt_key from OVS_CB

OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
10 years agonet: dsa: fix mii_bus to host_dev replacement
Florian Fainelli [Mon, 15 Sep 2014 21:48:08 +0000 (14:48 -0700)]
net: dsa: fix mii_bus to host_dev replacement

dsa_of_probe() still used cd->mii_bus instead of cd->host_dev when
building with CONFIG_OF=y. Fix this by making the replacement here as
well.

Fixes: b4d2394d01b ("dsa: Replace mii_bus with a generic host device")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>