David S. Miller [Mon, 12 Jan 2015 02:40:07 +0000 (21:40 -0500)]
Merge branch 'irda-next'
Chunyan Zhang says:
====================
irda: Use ktime_t instead of timeval
This patch-set removed all uses of timeval and used ktime_t instead if
needed, since 32-bit time types will break in the year 2038.
This patch-set also used the ktime_xxx functions accordingly.
e.g.
* Used ktime_get to get the current time instead of do_gettimeofday.
* And, used ktime_us_delta to get the elapsed time directly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:32 +0000 (12:01 +0800)]
irda: vlsi_ir: Replace timeval with ktime_t
The vlsi ir driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.
This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.
This patch uses ktime_us_delta to get the elapsed time of microsecond,
and uses div_s64_rem to get what seconds & microseconds time elapsed
for printing.
This patch also changes the function 'vlsi_hard_start_xmit' to do the
same things as the others drivers, that is passing the remaining time
into udelay() instead of looping until enough time has passed.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:31 +0000 (12:01 +0800)]
irda: stir4200: Replace timeval with ktime_t
The stir4200 driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.
This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.
This patch uses ktime_us_delta to get the elapsed time of microsecond.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:30 +0000 (12:01 +0800)]
irda: nsc-ircc: Replace timeval with ktime_t
The nsc ircc driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.
This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.
This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:29 +0000 (12:01 +0800)]
irda: irda-usb: Replace timeval with ktime_t
The irda usb driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.
This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.
This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:28 +0000 (12:01 +0800)]
irda: ali-ircc: Replace timeval with ktime_t
The ali ircc driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.
This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.
This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chunyan Zhang [Thu, 8 Jan 2015 04:01:27 +0000 (12:01 +0800)]
irda: Removed all unused timeval variables
In the file au1k_ir.c & via-ircc.h, there were two unused definitions of the
timeval type members, this commit therefore removes this unneeded code.
In other three files, the same problem is the rx_time member is only ever
written, never read, so removed it entirely.
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Jan 2015 23:53:46 +0000 (18:53 -0500)]
Merge branch 'sti_drivers'
Peter Griffin says:
====================
Fix sti drivers whcih mix reg address spaces
A V2 of this old series incorporating Arnd and Lees Feedback form v1.
Following on from Arnds comments about the picophy driver here
https://lkml.org/lkml/2014/11/13/161, this series fixes the
remaining upstreamed drivers for STI, which are mixing address spaces
in the reg property. We do this in a way similar to the keystone
and bcm7445 platforms, by having sysconfig phandle/ offset pair
(where only one register is required). Or phandle / integer array
where multiple offsets in the same bank are needed).
This series breaks DT compatability! But the platform support
is WIP and only being used by the few developers who are upstreaming
support for it. I've made each change to the driver / dt doc / dt
file as a single atomic commit so the kernel will remain bisectable.
This series then also enables the picophy driver, and adds back in
the ehci/ohci dt nodes for stih410 which make use of the picophy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:12 +0000 (15:04 +0000)]
stmmac: dwmac-sti: Pass sysconfig register offset via syscon dt property.
Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
we should not be mixing address spaces in the reg property like this driver
currently does. This patch updates the driver, dt docs and also the existing
dt nodes to pass the sysconfig offset in the syscon dt property.
This patch breaks DT compatibility! But this platform is considered WIP,
and is only used by a few developers who are upstreaming support for it.
This change has been done as a single atomic commit to ensure it is
bisectable.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:11 +0000 (15:04 +0000)]
ARM: multi_v7_defconfig: Enable stih407 usb picophy
This patch enables the picoPHY usb phy which is used by
the usb2 and usb3 host controllers when controlling usb2/1.1
devices. It is found in stih407 family SoC's from STMicroelectronics.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:10 +0000 (15:04 +0000)]
ARM: STi: DT: STiH410: Add DT nodes for the ehci and ohci usb controllers.
This patch adds the DT nodes for the extra ehci and ohci usb controllers
on the stih410 SoC.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:09 +0000 (15:04 +0000)]
ARM: STi: DT: STiH410: Add usb2 picophy dt nodes
This patch adds the dt nodes for the extra usb2 picophys found on
the stih410.
These two picophys are used in conjunction with the extra ehci/ohci usb
controllers also found on the stih410 SoC.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:08 +0000 (15:04 +0000)]
ARM: STi: DT: STiH407: Add usb2 picophy dt nodes
This patch adds the dt nodes for the usb2 picophy found on the stih407
device family. It is used on stih407 by the dwc3 usb3 controller when
controlling usb2/1.1 devices.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:07 +0000 (15:04 +0000)]
phy: miphy365x: Pass sysconfig register offsets via syscfg dt property.
Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
update the miphy365 phy driver to access sysconfig register offsets via
syscfg dt property.
This is because the reg property should not be mixing address spaces
like it does currently for miphy365. This change then also aligns us
to how other platforms such as keystone and bcm7445 pass there syscon
offsets via DT.
This patch breaks DT compatibility, but this platform is considered WIP,
and is only used by a few developers who are upstreaming support for it.
This change has been done as a single atomic commit to ensure it is
bisectable.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Griffin [Wed, 7 Jan 2015 15:04:06 +0000 (15:04 +0000)]
phy: phy-stih407-usb: Pass sysconfig register offsets via syscfg property.
Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
update the phy driver to not use the reg property to access the sysconfig
register offsets.
This is because other phy's (miphy28, miphy365) have a combination of
memory mapped registers and sysconfig control regs, and we shouldn't
be mixing address spaces in the reg property. In addition we would
ideally like the sysconfig offsets to be passed via DT in a uniform way.
This new method will also allow us to support devices which have sysconfig
registers in different banks more easily and it is also analagous to how
keystone and bcm7745 platforms pass there syscon offsets in DT.
This breaks DT compatibility, but this platform is considered WIP, and
is only used by a few developers who are upstreaming support for it.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Jan 2015 04:14:32 +0000 (20:14 -0800)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- remove useless return in void functions
- remove unused member 'primary_iface' from 'struct orig_node'
- improve existing kernel doc
- fix several checkpatch complaints
- ensure socket's control block is cleared for received skbs
- add missing DEBUG_FS dependency to BATMAN_ADV_DEBUG symbol
Signed-off-by: David S. Miller <davem@davemloft.net>
Praveen Madhavan [Wed, 7 Jan 2015 13:46:28 +0000 (19:16 +0530)]
csiostor:firmware upgrade fix
This patch fixes removes older means of upgrading Firmware using MAJOR version
and adds newer interface version checking mechanism.
Please apply this patch on net-next since it depends on previous commits.
Signed-off-by: Praveen Madhavan <praveenm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Wed, 7 Jan 2015 12:39:53 +0000 (10:39 -0200)]
Revert "ARM: dts: imx6qdl: enable FEC magic-packet feature"
As
456062b3ec6f ("ARM: imx: add FEC sleep mode callback function") has been
reverted, also revert the dts part.
This reverts commit
07b4d2dda0c00f56248 ("ARM: dts: imx6qdl: enable FEC
magic-packet feature").
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Wed, 7 Jan 2015 12:39:52 +0000 (10:39 -0200)]
Revert "ARM: imx: add FEC sleep mode callback function"
i.MX platform maintainer Shawn Guo is not happy with the such commit as
explained below [1]:
"The GPR difference between SoCs can be encoded in device tree as well.
It's pointless to repeat the same code pattern for every single
platform, that need to set up GPR bits for enabling magic packet wake
up, while the only difference is the register and bit offset.
The platform code will become quite messy and unmaintainable if every
device driver dump their GPR register setup code into platform.
Sorry, but it's NACK from me."
This reverts commit
456062b3ec6f5b9 ("ARM: imx: add FEC sleep mode callback
function").
[1] http://www.spinics.net/lists/netdev/msg310922.html
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 7 Jan 2015 09:49:49 +0000 (10:49 +0100)]
r8169: add support for xmit_more
Delay update of hw tail descriptor if we know that another skb is going
to be sent.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Jan 2015 03:47:19 +0000 (19:47 -0800)]
Merge branch 'rhashtable-next'
Ying Xue says:
====================
Involve rhashtable_lookup_insert routine
The series aims to involve rhashtable_lookup_insert() to guarantee
that the process of lookup and insertion of an object from/into hash
table is finished atomically, allowing rhashtable's users not to
introduce an extra lock during search and insertion. For example,
tipc socket is the first user benefiting from this enhancement.
v2 changes:
- fix the issue of waking up worker thread under a wrong condition in
patch #2, which is pointed by Thomas.
- move a comment from rhashtable_inser() to rhashtable_wakeup_worker()
according to Thomas's suggestion in patch #2.
- indent the third line of condition statement in
rhashtable_wakeup_worker() to inner bracket in patch #2.
- drop patch #3 of v1 series
- fix an issue of being unable to remove an object from hash table in
certain special case in patch #4.
- involve a new patch #5 to avoid unnecessary wakeup for worker queue
thread
- involve a new patch #6 to initialize atomic "nelems" variable
- adjust "nelem_hint" value from 256 to 192 avoiding to unnecessarily
to shrink hash table from the beginning phase in patch #7.
v1 changes:
But before rhashtable_lookup_insert() is involved, the following
optimizations need to be first done:
- simplify rhashtable_lookup by reusing rhashtable_lookup_compare()
- introduce rhashtable_wakeup_worker() to further reduce duplicated
code in patch #2
- fix an issue in patch #3
- involve rhashtable_lookup_insert(). But in this version, we firstly
use rhashtable_lookup() to search duplicate key in both old and new
bucket table; secondly introduce another __rhashtable_insert() helper
function to reduce the duplicated code between rhashtable_insert()
and rhashtable_lookup_insert().
- add patch #5 into the series as it depends on above patches. But in
this version, no change is made comparing with its previous version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:58 +0000 (13:41 +0800)]
tipc: convert tipc reference table to use generic rhashtable
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.
If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.
Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:57 +0000 (13:41 +0800)]
rhashtable: initialize atomic nelems variable
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:56 +0000 (13:41 +0800)]
rhashtable: avoid unnecessary wakeup for worker queue
Move condition statements of verifying whether hash table size exceeds
its maximum threshold or reaches its minimum threshold from resizing
functions to resizing decision functions, avoiding unnecessary wakeup
for worker queue thread.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:55 +0000 (13:41 +0800)]
rhashtable: future table needs to be traversed when remove an object
When remove an object from hash table, we currently only traverse old
bucket table to check whether the object exists. If the object is not
found in it, we will try again. But in the second search loop, we still
search the object from the old table instead of future table. As a
result, the object may be not removed from hash table especially when
resizing is currently in progress and the object is just saved in the
future table.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:54 +0000 (13:41 +0800)]
rhashtable: involve rhashtable_lookup_insert routine
Involve a new function called rhashtable_lookup_insert() which makes
lookup and insertion atomic under bucket lock protection, helping us
avoid to introduce an extra lock when we search and insert an object
into hash table.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:53 +0000 (13:41 +0800)]
rhashtable: introduce rhashtable_wakeup_worker helper function
Introduce rhashtable_wakeup_worker() helper function to reduce
duplicated code where to wake up worker.
By the way, as long as the both "future_tbl" and "tbl" bucket table
pointers point to the same bucket array, we should try to wake up
the resizing worker thread, otherwise, it indicates the work of
resizing hash table is not finished yet. However, currently we will
wake up the worker thread only when the two pointers point to
different bucket array. Obviously this is wrong. So, the issue is
also fixed as well in the patch.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 7 Jan 2015 05:41:52 +0000 (13:41 +0800)]
rhashtable: optimize rhashtable_lookup routine
Define an internal compare function and relevant compare argument,
and then make use of rhashtable_lookup_compare() to lookup key in
hash table, reducing duplicated code between rhashtable_lookup()
and rhashtable_lookup_compare().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Jan 2015 03:39:18 +0000 (19:39 -0800)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
Add support for few debugfs entries
This patch series adds support for devlog, cim_la, cim_qcfg and mps_tcam
debugfs entries.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:03 +0000 (08:48 +0530)]
cxgb4: Add support for mps_tcam debugfs
Debug log to get the MPS TCAM table
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:02 +0000 (08:48 +0530)]
cxgb4: Add support for cim_qcfg entry in debugfs
Adds debug log to get cim queue config
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:01 +0000 (08:48 +0530)]
cxgb4: Add support for cim_la entry in debugfs
The CIM LA captures the embedded processor’s internal state. Optionally, it can
also trace the flow of data in and out of the embedded processor. Therefore, the
CIM LA output contains detailed information of what code the embedded processor
executed prior to the CIM LA capture.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:00 +0000 (08:48 +0530)]
cxgb4: Add support for devlog
Add support for device log entry in debugfs
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 6 Jan 2015 23:45:32 +0000 (15:45 -0800)]
doc: fix the compile error of txtimestamp.c
Vinson reported:
HOSTCC Documentation/networking/timestamping/txtimestamp
Documentation/networking/timestamping/txtimestamp.c:64:8: error:
redefinition of ‘struct in6_pktinfo’
struct in6_pktinfo {
^
In file included from /usr/include/arpa/inet.h:23:0,
from Documentation/networking/timestamping/txtimestamp.c:33:
/usr/include/netinet/in.h:456:8: note: originally defined here
struct in6_pktinfo
^
After we sync with libc header, we don't need this ugly hack any more.
Reported-by: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 6 Jan 2015 23:45:31 +0000 (15:45 -0800)]
ipv6: fix redefinition of in6_pktinfo and ip6_mtuinfo
Both netinet/in.h and linux/ipv6.h define these two structs,
if we include both of them, we got:
/usr/include/linux/ipv6.h:19:8: error: redefinition of ‘struct in6_pktinfo’
struct in6_pktinfo {
^
In file included from /usr/include/arpa/inet.h:22:0,
from txtimestamp.c:33:
/usr/include/netinet/in.h:524:8: note: originally defined here
struct in6_pktinfo
^
In file included from txtimestamp.c:40:0:
/usr/include/linux/ipv6.h:24:8: error: redefinition of ‘struct ip6_mtuinfo’
struct ip6_mtuinfo {
^
In file included from /usr/include/arpa/inet.h:22:0,
from txtimestamp.c:33:
/usr/include/netinet/in.h:531:8: note: originally defined here
struct ip6_mtuinfo
^
So similarly to what we did for in6_addr, we need to sync with
libc header on their definitions.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Pargmann [Sat, 29 Nov 2014 18:07:46 +0000 (19:07 +0100)]
batman-adv: Kconfig, Add missing DEBUG_FS dependency
BATMAN_ADV_DEBUG is using debugfs files for the debugging log. So it
depends on DEBUG_FS which is missing as dependency in the Kconfig file.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Simon Wunderlich [Tue, 30 Dec 2014 01:20:14 +0000 (02:20 +0100)]
batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Sun, 2 Nov 2014 10:29:56 +0000 (11:29 +0100)]
batman-adv: fix misspelled words
Reported-by: checkpatch
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Martin Hundebøll [Wed, 17 Sep 2014 06:56:19 +0000 (08:56 +0200)]
batman-adv: clear control block of received socket buffers
Since other network components (and some drivers) uses the control block
provided in skb's, the network coding feature might wrongly assume that
an SKB has been decoded, and thus not try to code it with another packet
again. This happens for instance when batman-adv is running on a bridge device.
Fix this by clearing the control block for every received SKB.
Introduced by
3c12de9a5c756b23fe7c9ab332474ece1568914c
("batman-adv: network coding - code and transmit packets if possible")
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Mon, 1 Sep 2014 12:37:29 +0000 (14:37 +0200)]
batman-adv: checkpatch - remove unnecessary parentheses
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Mon, 1 Sep 2014 12:37:28 +0000 (14:37 +0200)]
batman-adv: checkpatch - Please don't use multiple blank lines
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Mon, 1 Sep 2014 12:37:27 +0000 (14:37 +0200)]
batman-adv: checkpatch - Please use a blank line after declarations
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Mon, 1 Sep 2014 12:37:26 +0000 (14:37 +0200)]
batman-adv: checkpatch - No space is necessary after a cast
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Antonio Quartulli [Mon, 1 Sep 2014 12:37:25 +0000 (14:37 +0200)]
batman-adv: checkpatch - else is not generally useful after a break or return
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Martin Hundebøll [Tue, 15 Jul 2014 07:41:08 +0000 (09:41 +0200)]
batman-adv: kernel doc fixes for main.{c, h}
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Martin Hundebøll [Tue, 15 Jul 2014 07:41:07 +0000 (09:41 +0200)]
batman-adv: kernel doc fix for distributed-arp-table.h
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Martin Hundebøll [Tue, 15 Jul 2014 07:41:06 +0000 (09:41 +0200)]
batman-adv: kernel doc fixes for bridge_loop_avoidance.c
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Martin Hundebøll [Tue, 15 Jul 2014 07:41:05 +0000 (09:41 +0200)]
batman-adv: kernel doc fixes for bat_iv_ogm.c
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Simon Wunderlich [Wed, 16 Jul 2014 10:23:10 +0000 (12:23 +0200)]
batman-adv: remove obsolete variable primary_iface from orig_node
This variable became obsolete when changing to the new bonding mechanism
based on the multi interface optimization. Since its not used anywhere,
remove it.
Reported-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Antonio Quartulli [Mon, 21 Jul 2014 08:03:59 +0000 (10:03 +0200)]
batman-adv: avoid useless return in void functions
Cc: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
David S. Miller [Wed, 7 Jan 2015 03:29:20 +0000 (22:29 -0500)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Wed, 7 Jan 2015 01:48:14 +0000 (17:48 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Just a pile of random fixes, including:
1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu.
2) MDI{,X} eeprom check in e100 driver is reversed, from John W.
Linville.
3) Missing error return assignments in several ethernet drivers, from
Julia Lawall.
4) Altera TSE device doesn't come back up after ifconfig down/up
sequence, fix from Kostya Belezko.
5) Add more cases to the check for whether the qmi_wwan device has a
bogus MAC address and needs to be assigned a random one. From
Kristian Evensen.
6) Fix interrupt hangs in CPSW, from Felipe Balbi.
7) Implement ndo_features_check in r8152 so that the stack doesn't
feed GSO packets which are outside of the chip's capabilities.
From Hayes Wang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
qla3xxx: don't allow never end busy loop
xen-netback: fixing the propagation of the transmit shaper timeout
r8152: support ndo_features_check
batman-adv: fix potential TT client + orig-node memory leak
batman-adv: fix multicast counter when purging originators
batman-adv: fix counter for multicast supporting nodes
batman-adv: fix lock class for decoding hash in network-coding.c
batman-adv: fix delayed foreign originator recognition
batman-adv: fix and simplify condition when bonding should be used
Revert "mac80211: Fix accounting of the tailroom-needed counter"
net: ethernet: cpsw: fix hangs with interrupts
enic: free all rq buffs when allocation fails
qmi_wwan: Set random MAC on devices with buggy fw
openvswitch: Consistently include VLAN header in flow and port stats.
tcp: Do not apply TSO segment limit to non-TSO packets
Altera TSE: Add missing phydev
net/mlx4_core: Fix error flow in mlx4_init_hca()
net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
qlcnic: Fix return value in qlcnic_probe()
net: axienet: fix error return code
...
Linus Torvalds [Wed, 7 Jan 2015 01:39:31 +0000 (17:39 -0800)]
Merge tag 'for-linus-3' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI fixlet from Corey Minyard:
"Fix a compile warning"
* tag 'for-linus-3' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: Fix compile warning with tv_usec
Feng Kan [Tue, 6 Jan 2015 22:41:33 +0000 (15:41 -0700)]
net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
This adds support for APM X-Gene ethernet driver to use ACPI table to derive
ethernet driver parameter.
Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Tue, 6 Jan 2015 21:17:53 +0000 (23:17 +0200)]
qla3xxx: don't allow never end busy loop
The counter variable wasn't increased at all which may stuck under
certain circumstances.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Sun, 4 Jan 2015 09:36:02 +0000 (17:36 +0800)]
net/fsl: Add mEMAC MDIO support to XGMAC MDIO
The Freescale mEMAC supports operating at 10/100/1000/10G, and
its associated MDIO controller is likewise capable of operating
both Clause 22 and Clause 45 MDIO buses. It is nearly identical
to the MDIO controller on the XGMAC, so we just modify that
driver.
Portions of this driver developed by:
Sandeep Singh <sandeep@freescale.com>
Roy Zang <tie-fei.zang@freescale.com>
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ed Swierk [Sat, 3 Jan 2015 01:27:56 +0000 (17:27 -0800)]
ethtool: Extend ethtool plugin module eeprom API to phylib
This patch extends the ethtool plugin module eeprom API to support cards
whose phy support is delegated to a separate driver.
The handlers for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM call the
module_info and module_eeprom functions if the phy driver provides them;
otherwise the handlers call the equivalent ethtool_ops functions provided
by network drivers with built-in phy support.
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 6 Jan 2015 22:05:40 +0000 (14:05 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Revert a potential seek_data/hole regression which shows up when using
ext4 to handle ext3 file systems, plus two minor bug fixes"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove spurious KERN_INFO from ext4_warning call
Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
ext4: prevent online resize with backup superblock
Linus Torvalds [Tue, 6 Jan 2015 21:00:05 +0000 (13:00 -0800)]
mm: propagate error from stack expansion even for guard page
Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.
This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.
And since /proc/maps explicitly ignores the guard page (commit
d7824370e263: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.
This is the minimal patch: it just propagates the error. It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.
Let's see if anybody notices. We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.
Reported-and-tested-by: Jay Foad <jay.foad@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 6 Jan 2015 19:24:49 +0000 (14:24 -0500)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- ensure bonding is used (if enabled) for packets coming in the soft
interface
- fix race condition to avoid orig_nodes to be deleted right after
being added
- avoid false positive lockdep splats by assigning lockclass to
the proper hashtable lock objects
- avoid miscounting of multicast 'disabled' nodes in the network
- fix memory leak in the Global Translation Table in case of
originator interval change
Signed-off-by: David S. Miller <davem@davemloft.net>
Palik, Imre [Tue, 6 Jan 2015 15:44:44 +0000 (16:44 +0100)]
xen-netback: fixing the propagation of the transmit shaper timeout
Since
e9ce7cb6b107 ("xen-netback: Factor queue-specific data into queue struct"),
the transimt shaper timeout is always set to 0. The value the user sets via
xenbus is never propagated to the transmit shaper.
This patch fixes the issue.
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shrikrishna Khare [Tue, 6 Jan 2015 17:20:15 +0000 (09:20 -0800)]
Driver: Vmxnet3: Make Rx ring 2 size configurable
Rx ring 2 size can be configured by adjusting rx-jumbo parameter
of ethtool -G.
Signed-off-by: Ramya Bolla <bollar@vmware.com>
Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Jan 2015 18:29:27 +0000 (13:29 -0500)]
Merge tag 'mac80211-for-davem-2015-01-06' of git://git./linux/kernel/git/jberg/mac80211
Here's just a single fix - a revert of a patch that broke the
p54 and cw2100 drivers (arguably due to bad assumptions there.)
Since this affects kernels since 3.17, I decided to revert for
now and we'll revisit this optimisation properly for -next.
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Tue, 6 Jan 2015 09:41:58 +0000 (17:41 +0800)]
r8152: support ndo_features_check
Support ndo_features_check to avoid:
- the transport offset is more than the hw limitation when using hw checksum.
- the skb->len of a GSO packet is more than the limitation.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Tue, 6 Jan 2015 13:26:13 +0000 (14:26 +0100)]
arm_arch_timer: include clocksource.h directly
This driver makes use of the clocksource code. Previously it had only
included the proper header indirectly, but that chain was inadvertently
broken by
74d23cc "time: move the timecounter/cyclecounter code into its
own file."
This patch fixes the issue by including clocksource.h directly.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Tue, 6 Jan 2015 12:01:46 +0000 (17:31 +0530)]
cxgb4: Add PCI device ID for new T5 adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 13 Dec 2014 22:32:15 +0000 (23:32 +0100)]
batman-adv: fix potential TT client + orig-node memory leak
This patch fixes a potential memory leak which can occur once an
originator times out. On timeout the according global translation table
entry might not get purged correctly. Furthermore, the non purged TT
entry will cause its orig-node to leak, too. Which additionally can lead
to the new multicast optimization feature not kicking in because of a
therefore bogus counter.
In detail: The batadv_tt_global_entry->orig_list holds the reference to
the orig-node. Usually this reference is released after
BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()->
batadv_purge_orig_node()->batadv_update_route()->_batadv_update_route()->
batadv_tt_global_del_orig() which purges this global tt entry and
releases the reference to the orig-node.
However, if between two batadv_purge_orig_node() calls the orig-node
timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't
reached. Instead the according orig-node is removed from the
originator hash in _batadv_purge_orig(), the batadv_update_route()
part is skipped and won't be reached anymore.
Fixing the issue by moving batadv_tt_global_del_orig() out of the rcu
callback.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Thu, 30 Oct 2014 04:40:47 +0000 (05:40 +0100)]
batman-adv: fix multicast counter when purging originators
When purging an orig_node we should only decrease counter tracking the
number of nodes without multicast optimizations support if it was
increased through this orig_node before.
A not yet quite initialized orig_node (meaning it did not have its turn
in the mcast-tvlv handler so far) which gets purged would not adhere to
this and will lead to a counter imbalance.
Fixing this by adding a check whether the orig_node is mcast-initalized
before decreasing the counter in the mcast-orig_node-purging routine.
Introduced by
60432d756cf06e597ef9da511402dd059b112447
("batman-adv: Announce new capability via multicast TVLV")
Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Thu, 30 Oct 2014 04:40:46 +0000 (05:40 +0100)]
batman-adv: fix counter for multicast supporting nodes
A miscounting of nodes having multicast optimizations enabled can lead
to multicast packet loss in the following scenario:
If the first OGM a node receives from another one has no multicast
optimizations support (no multicast tvlv) then we are missing to
increase the counter. This potentially leads to the wrong assumption
that we could safely use multicast optimizations.
Fixings this by increasing the counter if the initial OGM has the
multicast TVLV unset, too.
Introduced by
60432d756cf06e597ef9da511402dd059b112447
("batman-adv: Announce new capability via multicast TVLV")
Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Martin Hundebøll [Tue, 11 Nov 2014 15:22:23 +0000 (16:22 +0100)]
batman-adv: fix lock class for decoding hash in network-coding.c
batadv_has_set_lock_class() is called with the wrong hash table as first
argument (probably due to a copy-paste error), which leads to false
positives when running with lockdep.
Introduced-by: 612d2b4fe0a1ff2f8389462a6f8be34e54124c05
("batman-adv: network coding - save overheard and tx packets for decoding")
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Linus Lüssing [Thu, 30 Oct 2014 05:23:40 +0000 (06:23 +0100)]
batman-adv: fix delayed foreign originator recognition
Currently it can happen that the reception of an OGM from a new
originator is not being accepted. More precisely it can happen that
an originator struct gets allocated and initialized
(batadv_orig_node_new()), even the TQ gets calculated and set correctly
(batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging
thread will decide to delete it if it has a chance to jump between
these two function calls.
This is because batadv_orig_node_new() initializes the last_seen value
to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to
other threads by adding it to the hash table already.
batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct,
current time a few lines later but if the purging thread jumps in between
that it will think that the orig_node timed out and will wrongly
schedule it for deletion already.
If the purging interval is the same as the originator interval (which is
the default: 1 second), then this game can continue for several rounds
until the random OGM jitter added enough difference between these
two (in tests, two to about four rounds seemed common).
Fixing this by initializing the last_seen variable of an orig_node
to the current time before adding it to the hash table.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Simon Wunderlich [Wed, 13 Aug 2014 12:26:56 +0000 (14:26 +0200)]
batman-adv: fix and simplify condition when bonding should be used
The current condition actually does NOT consider bonding when the
interface the packet came in from is the soft interface, which is the
opposite of what it should do (and the comment describes). Fix that and
slightly simplify the condition.
Reported-by: Ray Gibson <booray@gmail.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
David S. Miller [Tue, 6 Jan 2015 03:55:28 +0000 (22:55 -0500)]
Merge branch 'rt_cong_ctrl'
Daniel Borkmann says:
====================
net: allow setting congctl via routing table
This is the second part of our work and allows for setting the congestion
control algorithm via routing table. For details, please see individual
patches.
Since patch 1 is a bug fix, we suggest applying patch 1 to net, and then
merging net into net-next, for example, and following up with the remaining
feature patches wrt dependencies.
Joint work with Florian Westphal, suggested by Hannes Frederic Sowa.
Patch for iproute2 is available under [1], but will be reposted with along
with the man-page update when this set hits net-next.
[1] http://patchwork.ozlabs.org/patch/418149/
Thanks!
v2 -> v3:
- Added module auto-loading as suggested by David Miller, thanks!
- Added patch 2 for handling possible sleeps in fib6
- While working on this, we discovered a bug, hence fix in patch 1
- Added auto-loading to patch 4
- Rebased, retested, rest the same.
v1 -> v2:
- Very sorry, I noticed I had decnet disabled during testing.
Added missing header include in decnet, rest as is.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 5 Jan 2015 22:57:48 +0000 (23:57 +0100)]
net: tcp: add per route congestion control
This work adds the possibility to define a per route/destination
congestion control algorithm. Generally, this opens up the possibility
for a machine with different links to enforce specific congestion
control algorithms with optimal strategies for each of them based
on their network characteristics, even transparently for a single
application listening on all links.
For our specific use case, this additionally facilitates deployment
of DCTCP, for example, applications can easily serve internal
traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
would also allow for utilizing e.g. long living, low priority
background flows for certain destinations/routes while still being
able for normal traffic to utilize the default congestion control
algorithm. We also thought about a per netns setting (where different
defaults are possible), but given its actually a link specific
property, we argue that a per route/destination setting is the most
natural and flexible.
The administrator can utilize this through ip-route(8) by appending
"congctl [lock] <name>", where <name> denotes the name of a
congestion control algorithm and the optional lock parameter allows
to enforce the given algorithm so that applications in user space
would not be allowed to overwrite that algorithm for that destination.
The dst metric lookups are being done when a dst entry is already
available in order to avoid a costly lookup and still before the
algorithms are being initialized, thus overhead is very low when the
feature is not being used. While the client side would need to drop
the current reference on the module, on server side this can actually
even be avoided as we just got a flat-copied socket clone.
Joint work with Florian Westphal.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 5 Jan 2015 22:57:47 +0000 (23:57 +0100)]
net: tcp: add RTAX_CC_ALGO fib handling
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.
While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.
It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.
Joint work with Florian Westphal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 5 Jan 2015 22:57:46 +0000 (23:57 +0100)]
net: tcp: add key management to congestion control
This patch adds necessary infrastructure to the congestion control
framework for later per route congestion control support.
For a per route congestion control possibility, our aim is to store
a unique u32 key identifier into dst metrics, which can then be
mapped into a tcp_congestion_ops struct. We argue that having a
RTAX key entry is the most simple, generic and easy way to manage,
and also keeps the memory footprint of dst entries lower on 64 bit
than with storing a pointer directly, for example. Having a unique
key id also allows for decoupling actual TCP congestion control
module management from the FIB layer, i.e. we don't have to care
about expensive module refcounting inside the FIB at this point.
We first thought of using an IDR store for the realization, which
takes over dynamic assignment of unused key space and also performs
the key to pointer mapping in RCU. While doing so, we stumbled upon
the issue that due to the nature of dynamic key distribution, it
just so happens, arguably in very rare occasions, that excessive
module loads and unloads can lead to a possible reuse of previously
used key space. Thus, previously stale keys in the dst metric are
now being reassigned to a different congestion control algorithm,
which might lead to unexpected behaviour. One way to resolve this
would have been to walk FIBs on the actually rare occasion of a
module unload and reset the metric keys for each FIB in each netns,
but that's just very costly.
Therefore, we argue a better solution is to reuse the unique
congestion control algorithm name member and map that into u32 key
space through jhash. For that, we split the flags attribute (as it
currently uses 2 bits only anyway) into two u32 attributes, flags
and key, so that we can keep the cacheline boundary of 2 cachelines
on x86_64 and cache the precalculated key at registration time for
the fast path. On average we might expect 2 - 4 modules being loaded
worst case perhaps 15, so a key collision possibility is extremely
low, and guaranteed collision-free on LE/BE for all in-tree modules.
Overall this results in much simpler code, and all without the
overhead of an IDR. Due to the deterministic nature, modules can
now be unloaded, the congestion control algorithm for a specific
but unloaded key will fall back to the default one, and on module
reload time it will switch back to the expected algorithm
transparently.
Joint work with Florian Westphal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 5 Jan 2015 22:57:45 +0000 (23:57 +0100)]
net: tcp: refactor reinitialization of congestion control
We can just move this to an extra function and make the code
a bit more readable, no functional change.
Joint work with Florian Westphal.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 5 Jan 2015 22:57:44 +0000 (23:57 +0100)]
net: fib6: convert cfg metric to u32 outside of table write lock
Do the nla validation earlier, outside the write lock.
This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.
Joint work with Daniel Borkmann.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 5 Jan 2015 22:57:43 +0000 (23:57 +0100)]
net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference
When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit
e5fd387ad5b3.
One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.
Example trace from perf probe (inet_getpeer:41) where create is 1:
ip 6877 [002] 4221.391591: probe:inet_getpeer: (
ffffffff8165e293)
85e294 inet_getpeer.part.7 (<- kmem_cache_alloc())
85e578 inet_getpeer
8eb333 ipv6_cow_metrics
8f10ff fib6_commit_metrics
Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.
Joint work with Florian Westphal.
Fixes: e5fd387ad5b3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Sokolowski [Mon, 5 Jan 2015 17:29:21 +0000 (17:29 +0000)]
net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined
Add checking whether the call to ndo_dflt_fdb_dump is needed.
It is not expected to call ndo_dflt_fdb_dump unconditionally
by some drivers (i.e. qlcnic or macvlan) that defines
own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
and don't want ndo_dflt_fdb_dump to be called at all.
At the same time it is desirable to call the default dump
function on a bridge device.
Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
Add extra checking in br_fdb_dump to avoid duplicate entries
as now filter_dev can be NULL.
Following tests for filtering have been performed before
the change and after the patch was applied to make sure
they are the same and it doesn't break the filtering algorithm.
[root@localhost ~]# cd /root/iproute2-3.18.0/bridge
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
[root@localhost bridge]# brctl addbr br0
[root@localhost bridge]# brctl addif br0 dummy0
[root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
[root@localhost bridge]# # show all
[root@localhost bridge]# ./bridge fdb show
33:33:00:00:00:01 dev p2p1 self permanent
01:00:5e:00:00:01 dev p2p1 self permanent
33:33:ff:ac:ce:32 dev p2p1 self permanent
33:33:00:00:02:02 dev p2p1 self permanent
01:00:5e:00:00:fb dev p2p1 self permanent
33:33:00:00:00:01 dev p7p1 self permanent
01:00:5e:00:00:01 dev p7p1 self permanent
33:33:ff:79:50:53 dev p7p1 self permanent
33:33:00:00:02:02 dev p7p1 self permanent
01:00:5e:00:00:fb dev p7p1 self permanent
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by bridge
[root@localhost bridge]# ./bridge fdb show br br0
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by port
[root@localhost bridge]# ./bridge fdb show brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]# # filter by port + bridge
[root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]#
Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Jan 2015 03:44:56 +0000 (22:44 -0500)]
Merge branch 'ip_cmsg_csum'
Tom Herbert says:
====================
ip: Support checksum returned in csmg
This patch set allows the packet checksum for a datagram socket
to be returned in csum data in recvmsg. This allows userspace
to implement its own checksum over the data, for instance if an
IP tunnel was be implemented in user space, the inner checksum
could be validated.
Changes in this patch set:
- Move checksum conversion to inet_sock from udp_sock. This
generalizes checksum conversion for use with other protocols.
- Move IP cmsg constants to a header file and make processing
of the flags more efficient in ip_cmsg_recv
- Return checksum value in cmsg. This is specifically the unfolded
32 bit checksum of the full packet starting from the first byte
returned in recvmsg
Tested: Wrote a little server to get checksums in cmsg for UDP and
verfied correct checksum is returned.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 5 Jan 2015 21:56:17 +0000 (13:56 -0800)]
ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.
ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 5 Jan 2015 21:56:16 +0000 (13:56 -0800)]
ip: Add offset parameter to ip_cmsg_recv
Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.
ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 5 Jan 2015 21:56:15 +0000 (13:56 -0800)]
ip: IP cmsg cleanup
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
they can be referenced in other source files.
Restructure ip_cmsg_recv to not go through flags using shift, check
for flags by 'and'. This eliminates both the shift and a conditional
per flag check.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 5 Jan 2015 21:56:14 +0000 (13:56 -0800)]
ip: Move checksum convert defines to inet
Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 6 Jan 2015 00:04:21 +0000 (01:04 +0100)]
netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel()
Calling nla_nest_cancel() in a different order as the nesting was
built up can lead to negative offsets being calculated which
results in skb_trim() being called with an underflowed unsigned
int. Warn if mark < skb->data as it's definitely a bug.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 6 Jan 2015 01:05:20 +0000 (17:05 -0800)]
Linux 3.19-rc3
Linus Torvalds [Mon, 5 Jan 2015 22:49:02 +0000 (14:49 -0800)]
Merge tag 'powerpc-3.19-3' of git://git./linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- Wire up sys_execveat(). Tested on 32 & 64 bit.
- Fix for kdump on LE systems with cpus hot unplugged.
- Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
broke other platforms, we'll do a proper fix for 3.20.
* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
powerpc/kdump: Ignore failure in enabling big endian exception during crash
powerpc: Wire up sys_execveat() syscall
Linus Torvalds [Mon, 5 Jan 2015 22:31:20 +0000 (14:31 -0800)]
Merge tag 'please-pull-syscall' of git://git./linux/kernel/git/aegl/linux
Pull ia64 fixlet from Tony Luck:
"Add execveat syscall"
* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Enable execveat syscall for ia64
David S. Miller [Mon, 5 Jan 2015 21:34:53 +0000 (16:34 -0500)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
RDMA/cxgb4/cxgb4vf/csiostor: Cleanup register defines
This series continues to cleanup all the macros/register defines related to
SGE, PCIE, MC, MA, TCAM, MAC, etc that are defined in t4_regs.h and the
affected files.
Will post another 1 or 2 series so that we can cover all the macros so that
they all follow the same style to be consistent.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4 and csiostor driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 5 Jan 2015 11:00:47 +0000 (16:30 +0530)]
cxgb4/cxgb4vf/csiostor: Cleanup PL, XGMAC, SF and MC related register defines
This patch cleanups all PL, XGMAC and SF related macros/register defines
that are defined in t4_regs.h and the affected files
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 5 Jan 2015 11:00:46 +0000 (16:30 +0530)]
cxgb4/csiostor: Cleanup TP, MPS and TCAM related register defines
This patch cleanups all TP, MPS and TCAM related macros/register defines
that are defined in t4_regs.h and the affected files
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 5 Jan 2015 11:00:45 +0000 (16:30 +0530)]
cxgb4/cxg4vf/csiostor: Cleanup MC, MA and CIM related register defines
This patch cleanups all MC, MA and CIM related macros/register defines that are
defined in t4_regs.h and the affected files.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 5 Jan 2015 11:00:44 +0000 (16:30 +0530)]
cxgb4/cxgb4vf/csiostor: Cleanup SGE and PCI related register defines
This patch cleansup remaining SGE related macros/register defines and all PCI
related ones that are defined in t4_regs.h and the affected files.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Mon, 5 Jan 2015 11:00:43 +0000 (16:30 +0530)]
RDMA/cxgb4/cxgb4vf/csiostor: Cleanup SGE register defines
This patch cleanups all SGE related macros/register defines that are
defined in t4_regs.h and the affected files.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 5 Jan 2015 10:48:34 +0000 (05:48 -0500)]
be2net: support TX batching using skb->xmit_more flag
This patch uses skb->xmit_more flag to batch TX requests.
TX is flushed either when xmit_more is false or there is
no more space in the TXQ.
Skyhawk-R and BEx chips require an even number of wrbs to be posted.
So, when a batch of TX requests is accumulated, the last header wrb
may need to be fixed with an extra dummy wrb.
This patch refactors be_xmit() routine as a sequence of be_xmit_enqueue()
and be_xmit_flush() calls. The Tx completion code is also
updated to be able to unmap/free a batch of skbs rather than a single
skb.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kozlowski [Mon, 5 Jan 2015 09:02:31 +0000 (10:02 +0100)]
at86rf230: Constify struct regmap_config
The regmap_config struct may be const because it is not modified by the
driver and regmap_init() accepts pointer to const.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Luck [Mon, 5 Jan 2015 19:25:19 +0000 (11:25 -0800)]
[IA64] Enable execveat syscall for ia64
See commit
51f39a1f0cea1cacf8c787f652f26dfee9611874
syscalls: implement execveat() system call
Signed-off-by: Tony Luck <tony.luck@intel.com>
Johannes Berg [Mon, 5 Jan 2015 09:28:49 +0000 (10:28 +0100)]
Revert "mac80211: Fix accounting of the tailroom-needed counter"
This reverts commit
ca34e3b5c808385b175650605faa29e71e91991b.
It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.
Cc: stable@vger.kernel.org
Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez <chrischavez@gmx.us>
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Debugged-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ying Xue [Sun, 4 Jan 2015 07:24:35 +0000 (15:24 +0800)]
list_nulls: fix missing header
Fixup below build error:
include/linux/list_nulls.h: In function ‘hlist_nulls_del’:
include/linux/list_nulls.h:84:13: error: ‘LIST_POISON2’ undeclared (first use in this function)
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>