WANG Cong [Fri, 11 Nov 2016 18:20:50 +0000 (10:20 -0800)]
net: fix sleeping for sk_wait_event()
Similar to commit
14135f30e33c ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().
Switch to the new wait API.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2016 03:41:25 +0000 (22:41 -0500)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.
Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:
nft add table netdev x
nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
nft add rule netdev x y drop
And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:
17.30% kpktgend_0 [nf_tables] [k] nft_do_chain
15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev
I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.
This rework contains more specifically, in strict order, these patches:
1) Remove compile-time debugging from core.
2) Remove obsolete comments that predate the rcu era. These days it is
well known that a Netfilter hook always runs under rcu_read_lock().
3) Remove threshold handling, this is only used by br_netfilter too.
We already have specific code to handle this from br_netfilter,
so remove this code from the core path.
4) Deprecate NF_STOP, as this is only used by br_netfilter.
5) Place nf_state_hook pointer into xt_action_param structure, so
this structure fits into one single cacheline according to pahole.
This also implicit affects nftables since it also relies on the
xt_action_param structure.
6) Move state->hook_entries into nf_queue entry. The hook_entries
pointer is only required by nf_queue(), so we can store this in the
queue entry instead.
7) use switch() statement to handle verdict cases.
8) Remove hook_entries field from nf_hook_state structure, this is only
required by nf_queue, so store it in nf_queue_entry structure.
9) Merge nf_iterate() into nf_hook_slow() that results in a much more
simple and readable function.
10) Handle NF_REPEAT away from the core, so far the only client is
nf_conntrack_in() and we can restart the packet processing using a
simple goto to jump back there when the TCP requires it.
This update required a second pass to fix fallout, fix from
Arnd Bergmann.
11) Set random seed from nft_hash when no seed is specified from
userspace.
12) Simplify nf_tables expression registration, in a much smarter way
to save lots of boiler plate code, by Liping Zhang.
13) Simplify layer 4 protocol conntrack tracker registration, from
Davide Caratti.
14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
to recent generalization of the socket infrastructure, from Arnd
Bergmann.
15) Then, the ipset batch from Jozsef, he describes it as it follows:
* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
because it is unnecessary to zero out the area just before
explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
across all set types.
* Count non-static extension memory into memsize calculation for
userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
by Muhammad Falak R Wani, individually for the set type families.
16) Remove useless connlabel field in struct netns_ct, patch from
Florian Westphal.
17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
{ip,ip6,arp}tables code that uses this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 10 Nov 2016 11:10:29 +0000 (12:10 +0100)]
mlxsw: spectrum_router: Add FIB abort warning
Add a warning that the abort mechanism was triggered for device.
Also avoid going through the procedure if abort was already done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2016 03:36:42 +0000 (22:36 -0500)]
Merge branch 'dsa-mv88e6xxx-post-refactor-fixes'
Andrew Lunn says:
====================
dsa: mv88e6xxx: Fixes for port refactoring
The patches which refactored setting up the switch MACs introduced a
couple of regressions. The RGMII delays for a port can be set using
other mechanism than just phy-mode. Don't overwrite the delays unless
explicitly asked to. This broke my Armada 370 RD. Also, the mv88e6351
family supports setting RGMII delays, but is missing the necessary
entries in the ops structures to allow this.
These fixes are to patches currently in net-next. No need for stable
etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 10 Nov 2016 14:44:01 +0000 (15:44 +0100)]
net: dsa: mv88e6xxx: 6351 family also has RGMII delays
The recent refactoring of setting the MAC configuration broke setting
of RGMII delays, via the phy-mode, on the 6351 family. Add the missing
ops to the structure.
Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 10 Nov 2016 14:44:00 +0000 (15:44 +0100)]
net: dsa: mv88e6xxx: Don't modify RGMII delays when not RGMII mode
The RGMII modes delays can be set via strapping pings or EEPROM.
Don't change them unless explicitly asked to change them. The recent
refactoring of setting the MAC configuration changed this behaviours,
in that CPU and DSA ports have any pre-configured RGMII delays
removed. This breaks the Armada 370RD board. Restore the previous
behaviour, in that RGMII delays are only applied/removed when
explicitly asked for via an phy-mode being PHY_INTERFACE_MODE_RGMII*
Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Fri, 11 Nov 2016 12:32:38 +0000 (13:32 +0100)]
netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test
Since commit
7926dbfa4bc1 ("netfilter: don't use
mutex_lock_interruptible()"), the function xt_find_table_lock can only
return NULL on an error. Simplify the call sites and update the
comment before the function.
The semantic patch that change the code is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression t,e;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- ! IS_ERR_OR_NULL(t)
+ t
@@
expression t,e;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ !t
@@
expression t,e,e1;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ e1
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Thu, 10 Nov 2016 13:17:01 +0000 (14:17 +0100)]
netfilter: conntrack: remove unused netns_ct member
since
23014011ba420 ('netfilter: conntrack: support a fixed size of 128 distinct labels')
this isn't needed anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Philippe Reynes [Sat, 12 Nov 2016 22:16:51 +0000 (23:16 +0100)]
net: atheros: atl1e: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Fri, 11 Nov 2016 14:10:47 +0000 (16:10 +0200)]
net: ethernet: ti: davinci_cpdma: don't stop ctlr if it was stopped
No need to stop ctlr if it was already stopped. It can cause timeout
warns. Steps:
- ifconfig eth0 down
- ethtool -l eth0 rx 8 tx 8
- ethtool -l eth0 rx 1 tx 1
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Fri, 11 Nov 2016 13:45:24 +0000 (15:45 +0200)]
net: ethernet: ti: davinci_cpdma: fix fixed prio cpdma ctlr configuration
The dma ctlr is reseted to 0 while cpdma soft reset, thus cpdma ctlr
cannot be configured after cpdma is stopped. So restoring content
of cpdma ctlr while off/on procedure is needed. The cpdma ctlr off/on
procedure is present while interface down/up and while changing number
of channels with ethtool. In order to not restore content in many
places, move it to cpdma_ctlr_start().
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 11 Nov 2016 10:22:53 +0000 (11:22 +0100)]
mlxsw: reg: Fix pwm_frequency field size in MFCR register
The field is 7bit long. Fix it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Nov 2016 17:14:59 +0000 (12:14 -0500)]
genetlink: Make family a signed integer.
The idr_alloc(), idr_remove(), et al. routines all expect IDs to be
signed integers. Therefore make the genl_family member 'id' signed
too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 10 Nov 2016 14:03:01 +0000 (15:03 +0100)]
net: phy: marvell: optimize logic for page changing during init
Instead of remembering if the page was changed, just compare the current
page to the saved one. This is easier and has the advantage to save a
register write if the page was already restored.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Nov 2016 05:56:28 +0000 (00:56 -0500)]
Merge branch 'amd-xgbe-updates'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-11-10
This patch series is targeted at adding support for a new PCI version
of the hardware. As part of the new PCI device, there is a new PCS/PHY
interaction, ECC support, I2C sideband communication, SFP+ support and
more.
The following updates and fixes are included in this driver update series:
- Hardware workaround for possible incorrectly generated interrupts
during software reset
- Hardware workaround for Tx timestamp register access order
- Add support for a PCI version of the device
- Increase the Rx queue limit to take advantage of the increased number
of DMA channels that might be available
- Add support for a new DMA channel interrupt mode
- Add ECC support for the device memory
- Add support for using the integrated I2C controller for sideband
communication
- Expose the phylib phy_aneg_done() function so it can be called by the
driver
- Add support for SFP+ modules
- Add support for MDIO attached PHYs
- Add support for KR re-driver between the PCS/SerDes and an external
PHY
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:11:41 +0000 (17:11 -0600)]
amd-xgbe: Add support for a KR redriver
This patch provides support for the presence of a KR redriver chip in
between the device PCS and an external PHY. When a redriver chip is
present the device must perform clause 73 auto-negotiation in order to
set the redriver chip for the downstream connection.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:11:14 +0000 (17:11 -0600)]
amd-xgbe: Add support for MDIO attached PHYs
Use the phylib support in the kernel to communicate with and control an
MDIO attached PHY. Use the hardware's MDIO communication mechanism to
communicate with the PHY.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:58 +0000 (17:10 -0600)]
amd-xgbe: Add support for SFP+ modules
Add support for recognizing and using SFP+ modules directly. This includes
using the I2C support to read and interpret the information returned from
an SFP+ module and configuring things properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:46 +0000 (17:10 -0600)]
net: phy: expose phy_aneg_done API for use by drivers
Make phy_aneg_done() available to drivers so that the result of the
auto-negotiation initiated by phy_start_aneg() can be determined.
Remove the local implementation of phy_aneg_done() from the Aeroflex
driver and use the phy library version.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:36 +0000 (17:10 -0600)]
amd-xgbe: Add I2C support for sideband communication
Add support to initialize and use the I2C controller within the hardware
in order to perform sideband communication, e.g. determine the SFP media
type that is installed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:26 +0000 (17:10 -0600)]
amd-xgbe: Add ECC status support for the device memory
Some versions of the amd-xgbe device are capable of reporting ECC error
information back to the driver. Add support to process, track and report
on this information.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:17 +0000 (17:10 -0600)]
amd-xgbe: Add support for new DMA interrupt mode
The current per channel DMA interrupt support is based on an edge
triggered interrupt that is not maskable. This results in having to call
the disable_irq/enable_irq functions in order to prevent interrupts
during napi processing. The hardware now has a way to configure the per
channel DMA interrupt that will allow for masking the interrupt which
prevents calling disable_irq/enable_irq now. This patch makes use of
this support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:05 +0000 (17:10 -0600)]
amd-xgbe: Allow for a greater number of Rx queues
Remove the call to netif_get_num_default_rss_queues() and replace it
with num_online_cpus() to allow for the possibility of using all of
the hardware DMA channels available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:09:55 +0000 (17:09 -0600)]
amd-xgbe: Add PCI device support
Add support for new PCI devices to the driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:09:45 +0000 (17:09 -0600)]
amd-xgbe: Add a workaround for Tx timestamp issue
Update the reading of the Tx timestamp to account for a hardware issue
on how the fields and interrupt are cleared. The "seconds" portion of
the timestamp should be read first, followed by the "nanoseconds" portion.
Reading the "nanoseconds" portion should clear the timestamp data and the
interrupt. Because of an issue with the hardware this order is reversed
and reading the "seconds" portion actually clears the timestamp. The code
currently follows this workaround, but to guard against future versions
where this is fixed add a field to the version data to indicate if the
workaround is required or not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:09:29 +0000 (17:09 -0600)]
amd-xgbe: Guard against incorrectly generated interrupts
Due to a hardware issue, it is possible for interrupt events to be
incorrectly generated when performing a soft reset. To guard against
this, perform the soft reset twice.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Nov 2016 05:51:04 +0000 (00:51 -0500)]
Merge branch 'ovs-L3-encap'
Jiri Benc says:
====================
openvswitch: support for layer 3 encapsulated packets
At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header.
The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.
This series is based on work by Simon Horman, Lorand Jakab, Thomas Morin and
others. I kept Lorand's and Simon's s-o-b in the patches that are derived
from v11 to record their authorship of parts of the code.
Changes from v12 to v13:
* Addressed Pravin's feedback.
* Removed the GRE vport conversion patch; L3 GRE ports should be created by
rtnetlink instead.
Main changes from v11 to v12:
* The patches were restructured and split differently for easier review.
* They were rebased and adjusted to the current net-next. Especially MPLS
handling is different (and easier) thanks to the recent MPLS GSO rework.
* Several bugs were discovered and fixed. The most notable is fragment
handling: header adjustment for ARPHRD_NONE devices on tx needs to be done
after refragmentation, not before it. This required significant changes in
the patchset. Another one is stricter checking of attributes (match on L2
vs. L3 packet) at the kernel level.
* Instead of is_layer3 bool, a mac_proto field is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:24 +0000 (16:28 +0100)]
openvswitch: allow L3 netdev ports
Allow ARPHRD_NONE interfaces to be added to ovs bridge.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:23 +0000 (16:28 +0100)]
openvswitch: add Ethernet push and pop actions
It's not allowed to push Ethernet header in front of another Ethernet
header.
It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:22 +0000 (16:28 +0100)]
openvswitch: netlink: support L3 packets
Extend the ovs flow netlink protocol to support L3 packets. Packets without
OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the
OVS_KEY_ATTR_ETHERTYPE attribute is mandatory.
Push/pop vlan actions are only supported for Ethernet packets.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:21 +0000 (16:28 +0100)]
openvswitch: add processing of L3 packets
Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).
Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:20 +0000 (16:28 +0100)]
openvswitch: support MPLS push and pop for L3 packets
Update Ethernet header only if there is one.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:19 +0000 (16:28 +0100)]
openvswitch: pass mac_proto to ovs_vport_send
We'll need it to alter packets sent to ARPHRD_NONE interfaces.
Change do_output() to use the actual L2 header size of the packet when
deciding on the minimum cutlen. The assumption here is that what matters is
not the output interface hard_header_len but rather the L2 header of the
particular packet. For example, ARPHRD_NONE tunnels that encapsulate
Ethernet should get at least the Ethernet header.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:18 +0000 (16:28 +0100)]
openvswitch: add mac_proto field to the flow key
Use a hole in the structure. We support only Ethernet so far and will add
a support for L2-less packets shortly. We could use a bool to indicate
whether the Ethernet header is present or not but the approach with the
mac_proto field is more generic and occupies the same number of bytes in the
struct, while allowing later extensibility. It also makes the code in the
next patches more self explaining.
It would be nice to use ARPHRD_ constants but those are u16 which would be
waste. Thus define our own constants.
Another upside of this is that we can overload this new field to also denote
whether the flow key is valid. This has the advantage that on
refragmentation, we don't have to reparse the packet but can rely on the
stored eth.type. This is especially important for the next patches in this
series - instead of adding another branch for L2-less packets before calling
ovs_fragment, we can just remove all those branches completely.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:17 +0000 (16:28 +0100)]
openvswitch: use hard_header_len instead of hardcoded ETH_HLEN
On tx, use hard_header_len while deciding whether to refragment or drop the
packet. That way, all combinations are calculated correctly:
* L2 packet going to L2 interface (the L2 header len is subtracted),
* L2 packet going to L3 interface (the L2 header is included in the packet
lenght),
* L3 packet going to L3 interface.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 9 Nov 2016 21:02:34 +0000 (22:02 +0100)]
bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
Commit
67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().
We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()
Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 8 Nov 2016 13:16:05 +0000 (15:16 +0200)]
net: ethernet: ti: davinci_cpdma: free memory while channel destroy
While create/destroy channel operation memory is not freed. It was
supposed that memory is freed while driver remove. But a channel
can be created and destroyed many times while changing number of
channels with ethtool.
Based on net-next/master
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 16:45:50 +0000 (11:45 -0500)]
Merge branch 'hns-fixes'
Salil Mehta says:
====================
Bug fixes & Code improvements in HNS driver
This patch-set introduces some bug fixes and code improvements.
These have been identified during internal review or testing of
the driver by internal Hisilicon teams.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:14:01 +0000 (18:14 +0000)]
net: hns: add the support to add/remove the ucast entry to/from table
This patch adds the support to add or remove the unicast entries
to the table and remove from the table.
Reported-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:14:00 +0000 (18:14 +0000)]
net: hns: add multicast tcam table clear
There is no clear operation before add a new multicast tcam table,
so the tcam table will be overflow when add more entries.
Reported-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:59 +0000 (18:13 +0000)]
net: hns: modify tcam table of mask_key
The packets of wrong mac address(only the last bit is different) can be
received in Big-endian by current definition of mask_key. Thus it needs
to be modified to support Big-endian and ensure Big-endian normal.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:58 +0000 (18:13 +0000)]
net: hns: modify tcam table of mac mc-entry
The current definition of mac_mc_entry is only suitable for
Little-endian. Thus it needs to modify tcam table of mac mc-entry
to support both Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:57 +0000 (18:13 +0000)]
net: hns: modify tcam table of mac mc-port
Little-endian is only supported by current tcam table to add
or delete mac mc-port. This patch makes it support both
Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:56 +0000 (18:13 +0000)]
net: hns: modify table index to get mac entry
Big-endian is not supported by the current definition of table index to get
mac entry. It needs to be modified to support both Little-endian
and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:55 +0000 (18:13 +0000)]
net: hns: modify tcam table of mac uc-entry
The current definition of mac_uc_entry is only suitable for
Little-endian. Thus it needs to modify tcam table of mac uc-entry
to support both Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:54 +0000 (18:13 +0000)]
net: hns: modify tcam table and set mac key
The current definition of dsaf_drv_tbl_tcam_key is only suitable for
Little-endian. If data is stored in Big-endian, this may lead to
error in data use. Shift operation can make it work normally in both
Big-endian and Little-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:53 +0000 (18:13 +0000)]
net: hns: modify buffer format of cpu data to le64
Hardware ring buffer data is stored in Little-endian. Thus cpu data
should be modified to Little-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daode Huang [Wed, 9 Nov 2016 18:13:52 +0000 (18:13 +0000)]
net: hns: fix to intimate the link-status change by adding LF/RF method
In current scenario, when the interface is disabled we reset the XGMAC
RX/TX functionality. This operation does not affects the PHY layer/SFP
and which appears UP to the remote end(this behaviour is unlike GMAC).
The result is remote end keeps on sending the packets which gets partly
processed by XMAC and dropped. Since these are partly processed these
appears as errored packets in the packet counter statistics.
This patch fixes this behaviour and adds local-fault and remote-fault
functionality which can be used to intimate the remote peer whenever
the state of the interface changes. This patch also removes the
existing hns_dsaf_xge_core_srst_by_port function which was being used
to reset the RX/TX functionality at XGE Core.
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:51 +0000 (18:13 +0000)]
net: hns: modify ethtool statistics value error
This patch modify the gmac_rx_filt_pkt and gmac_rx_octets_total_filt
statistics value. The two statistics is inconsistent with register,
and just the opposite.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Jun He <hjat2005@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qianqian Xie [Wed, 9 Nov 2016 18:13:50 +0000 (18:13 +0000)]
net: hns: delete redundant macro definition
This patch deletes redundant macro definitions in hns drivers.
And change the .h file containing relation to make the layers
more clearly
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Weiwei Deng <dengweiwei@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daode Huang [Wed, 9 Nov 2016 18:13:49 +0000 (18:13 +0000)]
net: hns: bug fix about restart auto-negotiation
When set auto-negotiation off and duplex half, if run "ethtool -r ethX"
on port with phy, then the port will be failed to work. It should
forbid to start auto-negotiation when auto-negotiate is off. This
patch add the limited condition.
Reported-by: Jinchuang Tian <tianjinchuang1@huawei.com>
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daode Huang [Wed, 9 Nov 2016 18:13:48 +0000 (18:13 +0000)]
net: hns: set default mac pause time to 0xffff
The default mac pause time set to 0xff which is too short for pausing,
this patch change it to the max value 0xffff.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:13:47 +0000 (18:13 +0000)]
net: hns: fix for promisc mode in HNS driver
If set promisc mode when there is some traffic, The service nic will
cause system halted. We reserve the last 6 tcam entry for the 6 ports.
If promisc mode is enabled, we can config the relative tcam as fuzzy
matching and set to be valid, or set the tcam to be invalid
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:13:46 +0000 (18:13 +0000)]
net: hns: add fuzzy match of tcam table for hns
Since there is not enough tcam table entries for vlan and multicast
address, HNSv2 needs to add support of fuzzy matching of TCAM tables.
To add fuzzy match of TCAM, we Add the property to mask the bits to
be fuzzy matched
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Wed, 9 Nov 2016 18:13:45 +0000 (18:13 +0000)]
Doc: hisi: hns adds mc-mac-mask property
Since there is not enough tcam table entries for every vlan and multicast
address, HNS needs to add support of fuzzy matching of TCAM tables. Adding
the property to mask the bits to be fuzzy matched, so update the bindings
document
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kbuild test robot [Sat, 15 Oct 2016 01:13:14 +0000 (09:13 +0800)]
netfilter: ipset: hash: fix boolreturn.cocci warnings
net/netfilter/ipset/ip_set_hash_ipmac.c:70:8-9: WARNING: return of 0/1 in function 'hash_ipmac4_data_list' with return type bool
net/netfilter/ipset/ip_set_hash_ipmac.c:178:8-9: WARNING: return of 0/1 in function 'hash_ipmac6_data_list' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
CC: Tomasz Chilinski <tomasz.chilinski@chilan.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 11:32:07 +0000 (12:32 +0100)]
netfilter: ipset: use setup_timer() and mod_timer().
Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.
Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
active timer (if the timer is inactive it will be activated).
Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tomasz Chilinski [Thu, 5 May 2016 05:21:26 +0000 (07:21 +0200)]
netfilter: ipset: hash:ipmac type support added to ipset
Introduce the hash:ipmac type.
Signed-off-by: Tomasz Chili??ski <tomasz.chilinski@chilan.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Wed, 4 Nov 2015 08:44:29 +0000 (09:44 +0100)]
netfilter: ipset: Fix reported memory size for hash:* types
The calculation of the full allocated memory did not take
into account the size of the base hash bucket structure at some
places.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Mon, 2 Nov 2015 19:27:58 +0000 (20:27 +0100)]
netfilter: ipset: Collapse same condition body to a single one
The set full case (with net_ratelimit()-ed pr_warn()) is already
handled, simply jump there.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Fri, 26 Jun 2015 13:13:18 +0000 (15:13 +0200)]
netfilter: ipset: Make struct htype per ipset family
Before this patch struct htype created at the first source
of ip_set_hash_gen.h and it is common for both IPv4 and IPv6
set variants.
Make struct htype per ipset family and use NLEN to make
nets array fixed size to simplify struct htype allocation.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Tue, 11 Oct 2016 05:25:00 +0000 (07:25 +0200)]
netfilter: ipset: Optimize hash creation routine
Exit as easly as possible on error and use RCU_INIT_POINTER()
as set is not seen at creation time.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Fri, 26 Jun 2015 09:16:28 +0000 (11:16 +0200)]
netfilter: ipset: Make sure element data size is a multiple of u32
Data for hashing required to be array of u32. Make sure that
element data always multiple of u32.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 11:24:10 +0000 (12:24 +0100)]
netfilter: ipset: Make NLEN compile time constant for hash types
Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h
and the only place where NLEN needed to be calculated at runtime
is *_create() method.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 11:18:06 +0000 (12:18 +0100)]
netfilter: ipset: Simplify mtype_expire() for hash types
Remove one leve of intendation by using continue while
iterating over elements in bucket.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 11:12:25 +0000 (12:12 +0100)]
netfilter: ipset: Remove redundant mtype_expire() arguments
Remove redundant parameters nets_length and dsize, because
they can be get from other parameters.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 11:05:34 +0000 (12:05 +0100)]
netfilter: ipset: Count non-static extension memory for userspace
Non-static (i.e. comment) extension was not counted into the memory
size. A new internal counter is introduced for this. In the case of
the hash types the sizes of the arrays are counted there as well so
that we can avoid to scan the whole set when just the header data
is requested.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Mon, 10 Oct 2016 20:07:41 +0000 (22:07 +0200)]
netfilter: ipset: Add element count to all set types header
It is better to list the set elements for all set types, thus the
header information is uniform. Element counts are therefore added
to the bitmap and list types.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Eric B Munson [Mon, 10 Oct 2016 19:59:21 +0000 (21:59 +0200)]
netfilter: ipset: Add element count to hash headers
It would be useful for userspace to query the size of an ipset hash,
however, this data is not exposed to userspace outside of counting the
number of member entries. This patch uses the attribute
IPSET_ATTR_ELEMENTS to indicate the size in the the header that is
exported to userspace. This field is then printed by the userspace
tool for hashes.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Josh Hunt <johunt@akamai.com>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Mon, 10 Oct 2016 19:52:51 +0000 (21:52 +0200)]
netfilter: ipset: Regroup ip_set_put_extensions and add extern
Cleanup: group ip_set_put_extensions and ip_set_get_extensions
together and add missing extern.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Mon, 10 Oct 2016 19:44:32 +0000 (21:44 +0200)]
netfilter: ipset: Separate memsize calculation code into dedicated function
Hash types already has it's memsize calculation code in separate
functions. Clean up and do the same for *bitmap* and *list* sets.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Mon, 10 Oct 2016 19:34:56 +0000 (21:34 +0200)]
netfilter: ipset: Split extensions into separate files
Cleanup to separate all extensions into individual files.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Wed, 6 May 2015 05:27:28 +0000 (07:27 +0200)]
netfilter: ipset: Use kmalloc() in comment extension helper
Allocate memory with kmalloc() rather than kzalloc(): the string
is immediately initialized so it is unnecessary to zero out
the allocated memory area.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Tue, 5 May 2015 15:13:28 +0000 (17:13 +0200)]
netfilter: ipset: Improve skbinfo get/init helpers
Use struct ip_set_skbinfo in struct ip_set_ext instead of open
coded fields and assign structure members in get/init helpers
instead of copying members one by one. Explicitly note that
struct ip_set_skbinfo must be padded to prevent non-aligned
access in the extension blob.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 10:31:03 +0000 (11:31 +0100)]
netfilter: ipset: Headers file cleanup
Group counter helper functions together.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 10:24:15 +0000 (11:24 +0100)]
netfilter: ipset: Mark some helper args as const.
Mark some of the helpers arguments as const.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Jozsef Kadlecsik [Thu, 10 Nov 2016 10:17:25 +0000 (11:17 +0100)]
netfilter: ipset: Remove extra whitespaces in ip_set.h
Remove unnecessary whitespaces.
Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Eric Dumazet [Wed, 9 Nov 2016 19:24:22 +0000 (11:24 -0800)]
tcp: remove unaligned accesses from tcp_get_info()
After commit
6ed46d1247a5 ("sock_diag: align nlattr properly when
needed"), tcp_get_info() gets 64bit aligned memory, so we can avoid
the unaligned helpers.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 03:15:28 +0000 (22:15 -0500)]
Merge tag 'batadv-next-for-davem-
20161108-v2' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
pull request for net-next: batman-adv 2016-11-08 v2
This feature and cleanup patchset includes the following changes:
- netlink and code cleanups by Sven Eckelmann (3 patches)
- Cleanup and minor fixes by Linus Luessing (3 patches)
- Speed up multicast update intervals, by Linus Luessing
- Avoid (re)broadcast in meshes for some easy cases,
by Linus Luessing
- Clean up tx return state handling, by Sven Eckelmann (6 patches)
- Fix some special mac address handling cases, by Sven Eckelmann
(3 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 02:20:01 +0000 (21:20 -0500)]
Merge branch 'PHC-freq-fine-tuning'
Richard Cochran says:
====================
PHC frequency fine tuning
This series expands the PTP Hardware Clock subsystem by adding a
method that passes the frequency tuning word to the the drivers
without dropping the low order bits. Keeping those bits is useful for
drivers whose frequency resolution is higher than 1 ppb.
The appended script (below) runs a simple demonstration of the
improvement. This test needs two Intel i210 PCIe cards installed in
the same PC, with their SDP0 pins connected by copper wire. Measuring
the estimated offset (from the ptp4l servo) and the true offset (from
the PPS) over one hour yields the following statistics.
| | Est. Before | Est. After | True Before | True After |
|--------+---------------+---------------+---------------+---------------|
| min | -5.
200000e+01 | -1.
600000e+01 | -3.
100000e+01 | -1.
000000e+00 |
| max | +5.
700000e+01 | +2.
500000e+01 | +8.
500000e+01 | +4.
000000e+01 |
| pk-pk: | +1.
090000e+02 | +4.
100000e+01 | +1.
160000e+02 | +4.
100000e+01 |
| mean | +6.
472222e-02 | +1.
277778e-02 | +2.
422083e+01 | +1.
826083e+01 |
| stddev | +1.
158006e+01 | +4.
581982e+00 | +1.
207708e+01 | +4.
981435e+00 |
Here the numbers in units of nanoseconds, and the ~20 nanosecond PPS
offset is due to input/output delays on the i210's external interface
logic.
With the series applied, both the peak to peak error and the standard
deviation improve by a factor of more than two. These two graphs show
the improvement nicely.
http://linuxptp.sourceforge.net/fine-tuning/fine-est.png
http://linuxptp.sourceforge.net/fine-tuning/fine-tru.png
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:18 +0000 (22:49 +0100)]
ptp: dp83640: Use the high resolution frequency method.
The dp83640 has a frequency resolution of about 0.029 ppb.
This patch lets users of the device benefit from the
increased frequency resolution when tuning the clock.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:17 +0000 (22:49 +0100)]
ptp: igb: Use the high resolution frequency method.
The 82580 and related devices offer a frequency resolution of about
0.029 ppb. This patch lets users of the device benefit from the
increased frequency resolution when tuning the clock.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 8 Nov 2016 21:49:16 +0000 (22:49 +0100)]
ptp: Introduce a high resolution frequency adjustment method.
The internal PTP Hardware Clock (PHC) interface limits the resolution for
frequency adjustments to one part per billion. However, some hardware
devices allow finer adjustment, and making use of the increased resolution
improves synchronization measurably on such devices.
This patch adds an alternative method that allows finer frequency tuning
by passing the scaled ppm value to PHC drivers. This value comes from
user space, and it has a resolution of about 0.015 ppb. We also deprecate
the older method, anticipating its removal once existing drivers have been
converted over.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 8 Nov 2016 19:07:28 +0000 (11:07 -0800)]
net: napi_hash_add() is no longer exported
There are no more users except from net/core/dev.c
napi_hash_add() can now be static.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 8 Nov 2016 19:06:53 +0000 (11:06 -0800)]
bnxt_en: do not call napi_hash_add()
This is automatically done from netif_napi_add(), and we want to not
export napi_hash_add() anymore in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 8 Nov 2016 15:40:28 +0000 (16:40 +0100)]
bpf: Remove unused but set variables
Remove the unused but set variables min_set and max_set in
adjust_reg_min_max_vals to fix the following warning when building with
'W=1':
kernel/bpf/verifier.c:1483:7: warning: variable ‘min_set’ set but not used [-Wunused-but-set-variable]
There is no warning about max_set being unused, but since it is only
used in the assignment of min_set it can be removed as well.
They were introduced in commit
484611357c19 ("bpf: allow access into map
value arrays") but seem to have never been used.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 8 Nov 2016 15:24:03 +0000 (17:24 +0200)]
tc_act: Remove tcf_act macro
tc_act macro addressed a non existing field, and was not used in the
kernel source.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 Nov 2016 01:40:13 +0000 (20:40 -0500)]
Merge branch 'ipv6-sr'
David Lebrun says:
====================
net: add support for IPv6 Segment Routing
v5:
- Check SRH validity when adding a new route with lwtunnels and
when setting an IPV6_RTHDR socket option.
- Check that hdr->segments_left is not out of bounds when processing
an SR-enabled packet.
- Add __ro_after_init attribute to seg6_genl_policy structure.
- Add CONFIG_IPV6_SEG6_INLINE option to enable or disable
direct header insertion.
v4:
- Change @cleanup in ipv6_srh_rcv() from int to bool
- Move checksum helper functions into header file
- Add common definition for SR TLVs
- Add comments for HMAC computation algorithm
- Use rhashtable to store HMAC infos instead of linked list
- Remove packed attribute for struct sr6_tlv_hmac
- Use dst cache only if CONFIG_DST_CACHE is enabled
v3:
- Fix compilation for CONFIG_IPV6={n,m}
v2:
- Remove packed attribute from sr6 struct and replaced unaligned
16-bit flags with two 8-bit flags.
- SR code now included by default. Option CONFIG_IPV6_SEG6_HMAC
exists for HMAC support (which requires crypto dependencies).
- Replace "hidden" calls to mutex_{un,}lock to direct calls.
- Fix reverse xmas tree coding style.
- Fix cast-from-void*'s.
- Update skb->csum to account for SR modifications.
- Add dst_cache in seg6_output.
Segment Routing (SR) is a source routing paradigm, architecturally
defined in draft-ietf-spring-segment-routing-09 [1]. The IPv6 flavor of
SR is defined in draft-ietf-6man-segment-routing-header-02 [2].
The main idea is that an SR-enabled packet contains a list of segments,
which represent mandatory waypoints. Each waypoint is called a segment
endpoint. The SR-enabled packet is routed normally (e.g. shortest path)
between the segment endpoints. A node that inserts an SRH into a packet
is called an ingress node, and a node that is the last segment endpoint
is called an egress node.
From an IPv6 viewpoint, an SR-enabled packet contains an IPv6 extension
header, which is a Routing Header type 4, defined as follows:
struct ipv6_sr_hdr {
__u8 nexthdr;
__u8 hdrlen;
__u8 type;
__u8 segments_left;
__u8 first_segment;
__u8 flag_1;
__u8 flag_2;
__u8 reserved;
struct in6_addr segments[0];
};
The first 4 bytes of the SRH is consistent with the Routing Header
definition in RFC 2460. The type is set to `4' (SRH).
Each segment is encoded as an IPv6 address. The segments are encoded in
reverse order: segments[0] is the last segment of the path, and
segments[first_segment] is the first segment of the path.
segments[segments_left] points to the currently active segment and
segments_left is decremented at each segment endpoint.
There exist two ways for a packet to receive an SRH, we call them
encap mode and inline mode. In the encap mode, the packet is encapsulated
in an outer IPv6 header that contains the SRH. The inner (original) packet
is not modified. A virtual tunnel is thus created between the ingress node
(the node that encapsulates) and the egress node (the last segment of the path).
Once an encapsulated SR packet reaches the egress node, the node decapsulates
the packet and performs a routing decision on the inner packet. This kind of
SRH insertion is intended to use for routers that encapsulates in-transit
packet.
The second SRH insertion method, the inline mode, acts by directly inserting
the SRH right after the IPv6 header of the original packet. For this method,
if a particular flag (SR6_FLAG_CLEANUP) is set, then the penultimate segment
endpoint must strip the SRH from the packet before forwarding it to the last
segment endpoint. This insertion method is intended to use for endhosts,
however it is also used for in-transit packets by some industry actors.
Note that directly inserting extension headers may break several mechanisms
such as Path MTU Discovery, IPSec AH, etc. For this reason, this insertion
method is only available if CONFIG_IPV6_SEG6_INLINE is enabled.
Finally, the SRH may contain TLVs after the segments list. Several types of
TLVs are defined, but we currently consider only the HMAC TLV. This TLV is
an answer to the deprecation of the RH0 and enables to ensure the authenticity
and integrity of the SRH. The HMAC text contains the flags, the first_segment
index, the full list of segments, and the source address of the packet. While
SR is intended to use mostly within a single administrative domain, the HMAC
TLV allows to verify SR packets coming from an untrusted source.
This patches series implements support for the IPv6 flavor of SR and is
logically divided into the following components:
(1) Data plane support (patch 01). This patch adds a function
in net/ipv6/exthdrs.c to handle the Routing Header type 4.
It enables the kernel to act as a segment endpoint, by supporting
the following operations: decrementation of the segments_left field,
cleanup flag support (removal of the SRH if we are the penultimate
segment endpoint) and decapsulation of the inner packet as an egress
node.
(2) Control plane support (patches 02..03 and 07..09). These patches enables
to insert SRH on locally emitted and/or forwarded packets, both with
encap mode and with inline mode. The SRH insertion is controlled through
the lightweight tunnels mechanism. Furthermore, patch 08 enables the
applications to insert an SRH on a per-socket basis, through the
setsockopt() system call. The mechanism to specify a per-socket
Routing Header was already defined for RH0 and no special modification
was performed on this side. However, the code to actually push the RH
onto the packets had to be adapted for the SRH specifications.
(3) HMAC support (patches 04..06). These patches adds the support of the
HMAC TLV verification for the dataplane part, and generation for
the control plane part. Two hashing algorithms are supported
(SHA-1 as legacy and SHA-256 as required by the IETF draft), but
additional algorithms can be easily supported by simply adding an
entry into an array.
[1] https://tools.ietf.org/html/draft-ietf-spring-segment-routing-09
[2] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:22 +0000 (14:59 +0100)]
ipv6: sr: add documentation file for per-interface sysctls
This patch adds documentation for some SR-related per-interface
sysctls.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:21 +0000 (14:59 +0100)]
ipv6: sr: add support for SRH injection through setsockopt
This patch adds support for per-socket SRH injection with the setsockopt
system call through the IPPROTO_IPV6, IPV6_RTHDR options.
The SRH is pushed through the ipv6_push_nfrag_opts function.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:20 +0000 (14:59 +0100)]
ipv6: add source address argument for ipv6_push_nfrag_opts
This patch prepares for insertion of SRH through setsockopt().
The new source address argument is used when an HMAC field is
present in the SRH, which must be filled. The HMAC signature
process requires the source address as input text.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:19 +0000 (14:59 +0100)]
ipv6: sr: add calls to verify and insert HMAC signatures
This patch enables the verification of the HMAC signature for transiting
SR-enabled packets, and its insertion on encapsulated/injected SRH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:59:18 +0000 (14:59 +0100)]
ipv6: sr: implement API to control SR HMAC structure
This patch provides an implementation of the genetlink commands
to associate a given HMAC key identifier with an hashing algorithm
and a secret.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:42 +0000 (14:57 +0100)]
ipv6: sr: add core files for SR HMAC support
This patch adds the necessary functions to compute and check the HMAC signature
of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and
hmac(sha256).
In order to avoid dynamic memory allocation for each HMAC computation,
a per-cpu ring buffer is allocated for this purpose.
A new per-interface sysctl called seg6_require_hmac is added, allowing a
user-defined policy for processing HMAC-signed SR-enabled packets.
A value of -1 means that the HMAC field will always be ignored.
A value of 0 means that if an HMAC field is present, its validity will
be enforced (the packet is dropped is the signature is incorrect).
Finally, a value of 1 means that any SR-enabled packet that does not
contain an HMAC signature or whose signature is incorrect will be dropped.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:41 +0000 (14:57 +0100)]
ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
This patch creates a new type of interfaceless lightweight tunnel (SEG6),
enabling the encapsulation and injection of SRH within locally emitted
packets and forwarded packets.
>From a configuration viewpoint, a seg6 tunnel would be configured as follows:
ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0
Any packet whose destination address is fc00::1 would thus be encapsulated
within an outer IPv6 header containing the SRH with three segments, and would
actually be routed to the first segment of the list. If `mode inline' was
specified instead of `mode encap', then the SRH would be directly inserted
after the IPv6 header without outer encapsulation.
The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This
feature was made configurable because direct header insertion may break
several mechanisms such as PMTUD or IPSec AH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:40 +0000 (14:57 +0100)]
ipv6: sr: add code base for control plane support of SR-IPv6
This patch adds the necessary hooks and structures to provide support
for SR-IPv6 control plane, essentially the Generic Netlink commands
that will be used for userspace control over the Segment Routing
kernel structures.
The genetlink commands provide control over two different structures:
tunnel source and HMAC data. The tunnel source is the source address
that will be used by default when encapsulating packets into an
outer IPv6 header + SRH. If the tunnel source is set to :: then an
address of the outgoing interface will be selected as the source.
The HMAC commands currently just return ENOTSUPP and will be implemented
in a future patch.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:39 +0000 (14:57 +0100)]
ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)
Implement minimal support for processing of SR-enabled packets
as described in
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.
This patch implements the following operations:
- Intermediate segment endpoint: incrementation of active segment and rerouting.
- Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
and routing of inner packet.
- Cleanup flag support for SR-inlined packets: removal of SRH if we are the
penultimate segment endpoint.
A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
packets. Default is deny.
This patch does not provide support for HMAC-signed packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 8 Nov 2016 13:31:38 +0000 (14:31 +0100)]
net: mii: report 0 for unknown lp_advertising
The newly introduced mii_ethtool_get_link_ksettings function sets
lp_advertising to an uninitialized value when BMCR_ANENABLE is not
set:
drivers/net/mii.c: In function 'mii_ethtool_get_link_ksettings':
drivers/net/mii.c:224:2: error: 'lp_advertising' may be used uninitialized in this function [-Werror=maybe-uninitialized]
As documented in include/uapi/linux/ethtool.h, the value is
expected to be zero when we don't know it, so let's initialize
it to that.
Fixes: bc8ee596afe8 ("net: mii: add generic function to support ksetting support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Beulich [Tue, 8 Nov 2016 07:45:53 +0000 (00:45 -0700)]
xen-netback: prefer xenbus_scanf() over xenbus_gather()
For single items being collected this should be preferred as being more
typesafe (as the compiler can check format string and to-be-written-to
variable match) and more efficient (requiring one less parameter to be
passed).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Mon, 7 Nov 2016 06:51:23 +0000 (14:51 +0800)]
igmp: Document sysctl force_igmp_version
There is some difference between force_igmp_version and force_mld_version.
Add document to make users aware of this.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>