git.openwrt.org Git - openwrt/staging/blogic.git/log

wil6210: prevent parallel suspend and dump collection

Suspend and crash dump operations can happen simultaneously
in case there is a FW assert during the suspend procedure
or when SSR calls all the devices crashdump callbacks.

To prevent that, a new flag is added, indicating that the
dumps collection is in progress, in order to allow the
suspend/reset decline if the dumps collection already started.

Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wil6210: set platform features based on FW capabilities

In some cases the platform should be aware of the FW capabilities
to decide which feature to enable.
For example, FW can control the external REF clock for power saving.
Driver should notify the platform about that, to allow platform
power management optimization.

Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wil6210: add platform capabilities bitmap

Add get_capa callback to platform ops to allow reading the platform
capabilities.
Supported capabilities:
- Keeping 11ad connection during suspend
- T_POWER_ON 0 support
- Usage of external clock

Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wil6210: support 40bit DMA addresses

Add the option to support 40bit addresses since some platforms
may not support 48bits but support 40bits

Signed-off-by: Lazar Alexei <qca_ailizaro@qca.qualcomm.com>
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

wil6210: support Scheduled scan

Add support for sched_scan_start/stop by sending PNO commands to FW.
Driver reports max_sched_scan_reqs and invokes
cfg80211_sched_scan_results upon receiving WMI_SCHED_SCAN_RESULT_EVENTID
from FW.

Signed-off-by: Dedy Lansky <qca_dlansky@qca.qualcomm.com>
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

ath10k: update copyright year

Update year for Qualcomm Atheros, Inc. copyrights.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add memory dump support QCA988X

Copy two regions of registers and bigger DRAM region to the dump file.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add memory dump support for QCA6174/QCA9377

Add memory dump to the firmware crash data file which is provided to user space
via devcoredump interface. This makes it easier for firmware engineers to debug
firmware crashes.

Due to increased memory consumption the memory dump is disabled by default. To
enable it make sure that bit 3 is set in coredump_mask module parameter:

modprobe ath10k_core coredump_mask=0xffffffff

When RAMDUMP is enabled a buffer for the dump is allocated with vmalloc during
device probe. The actual memory layout is different in hardware versions and
the layouts are defined in coredump.c. The memory is split to regions and, to
get even finegrained control of what to copy, the region can split to smaller
sections as not all registers are readable (which could cause the whole system
to stall).

Signed-off-by: Alan Liu <alanliu@qca.qualcomm.com>
[kvalo@qca.qualcomm.com: refactoring and cleanup]
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add coredump_mask module parameter

For memory dump support (it consumes quite a lot of memory) we need to control
what is exactly stored to the crash dump. Add a module parameter call
coredump_mask to do that. It's a bit mask of these values:

enum ath10k_fw_crash_dump_type {
ATH10K_FW_CRASH_DUMP_REGISTERS = 0,
ATH10K_FW_CRASH_DUMP_CE_DATA = 1,

ATH10K_FW_CRASH_DUMP_MAX,
};

For example, if we only want to store CE_DATA we would enable bit 2:

modprobe ath10k_core coredump_mask=0x2

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: detach coredump.c from debug.c

Now coredump is totally separate from debug.c and doesn't depend on
CONFIG_ATH10K_DEBUGFS anymore, only on CONFIG_DEV_COREDUMP. Also remove
leftovers from the removed debugfs file support.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: refactor firmware crashdump code to coredump.c

In preparation to add RAM dump support. No functional changes, only moving code
and renaming function names.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: remove deprecated fw_crash_dump debugfs file

The fw_crash_dump file was deprecated by commmit 727000e6af34 ("ath10k: support
dev_coredump for crash dump") in v4.11 in favor of dev_coredump interface,
remove it now for good. Everyone should use dev_coredump now.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: advertise TDLS wider bandwidth support for 5GHz

Enable TDLS wider bandwidth support for 5GHz based on firmware wmi capabilities.

This patch is required for chipset QCA9888. Tested with firmware version
10.4-3.5.1-00018.

Signed-off-by: Balaji Pothunoori <bpothuno@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add fw feature flag for non-bmi firmware load

HL1.0 firmware is not loaded via bmi. The bmi specific
code should not be executed for HL1.0

Add fw feature flag for non bmi targets and skip the bmi
specific code for non bmi targets.

Signed-off-by: Rakesh Pillai <pillair@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add debug mask for SNOC bus type

WCN3990 target uses SNOC bus.
Add debug mask for SNOC bus type.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Rakesh Pillai <pillair@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add SNOC bus type for WCN3990 target

WCN3990 is integrated chipset which uses system NOC.
Add SNOC bus type and related definitions.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Rakesh Pillai <pillair@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add support for 64 bit ce descriptor

WCN3990 CE descriptor uses 64bit address for
src/dst ring buffer. It has extended field for toeplitz
hash result, which is being used for HW assisted
hash results.

To accommodate WCN3990 descriptor, define new CE
descriptor for extended addressing mode and related
methods to handle the descriptor data.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Use dma_addr_t for ce buffers to support 64bit target

CE send and receive API's are using u32 ring address, which
truncates the address for target with 64bit addressing range.
Use dma_addr_t for ce buffers to support target with extended
addressing range.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add paddrs_ring_64 support for 64bit target

paddrs_ring_64 holds the physical device address of the
rx buffers that host SW provides for the MAC HW to fill.
Since this field is used in rx ring setup and rx ring
replenish in rx data path. Define separate methods
for handling 64 bit ring paddr and attach them dynamically
based on target_64bit hw param flag. Use u64 type
while popping paddr from the rx hash table for 64bit target.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add hw param for rx ring size support

WCN3990 uses larger ring size in comparison to existing
ring size value.
Add rx ring size hw param for supporting different rx ring
size across multiple target.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add support for htt_data_tx_desc_64 descriptor

WCN3990 target uses 64 bit frags_paddr in htt tx descriptor,
which holds the physical address of SKB fragments in tx data path.

In order to support 64 bit bit frags_paddr in htt tx descriptor, define
htt_data_tx_desc_64 descriptor and ath10k_htt_tx_64 method for handling
tx data path with new descriptor fields.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add support for 64 bit HTT frag descriptor

WCN3990 target uses 64 bit frag descriptor and more
fields in TSO flag.
Add support for 64 bit HTT frag descriptor.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add support for 64 bit htt rx ring cfg

WCN3900 target uses 64bit rx_ring_base_paddr and
fw_idx_shadow_reg_paddr fields in HTT rx ring cfg message.
These address points to the memory region where remote
ring empty buffers are allocated.
In order to add 64 bit htt rx ring cfg, define separate
64 bit htt rx ring cfg message and attach it in runtime
based on target_64bit hw param flag.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add support for 64 bit HTT in-order indication msg

WCN3990 target use 64bit msdu address in htt in-order
indication message. Add support for 64 bit msdu address in
HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND message.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Update rx descriptor for WCN3990 target

WCN3990 rx descriptor uses different offset of msdu start, msdu end,
ppdu end, rx pkt end and rx frag info.
To accommodate different offsets, define respective fields in
rx descriptor of WCN3990 target.

Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: Add hw param for 64-bit address support

WCN3990 target supports 37-bit addressing mode. In order
to accommodate extended address support, add hw param to
indicate if the target supports addressing above 32-bits.

Signed-off-by: Rakesh Pillai <pillair@qti.qualcomm.com>
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

wil6210: fix build warnings without CONFIG_PM

The #ifdef checks are hard to get right, in this case some functions
should have been left inside a CONFIG_PM_SLEEP check as seen by this
message:

drivers/net/wireless/ath/wil6210/pcie_bus.c:489:12: error: 'wil6210_pm_resume' defined but not used [-Werror=unused-function]
drivers/net/wireless/ath/wil6210/pcie_bus.c:484:12: error: 'wil6210_pm_suspend' defined but not used [-Werror=unused-function]

Using an __maybe_unused is easier here, so I'm replacing all the
other #ifdef in this file as well for consistency.

Fixes: 94162666cd51 ("wil6210: run-time PM when interface down")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: wmi: remove redundant integer fc

Variable fc is being assigned but never used, so remove it. Cleans
up the clang warning:

warning: Value stored to 'fc' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

Merge branch 'nfp-flower-add-Geneve-tunnel-support'

Simon Horman says:

====================
nfp: flower: add Geneve tunnel support

John Hurley says:

This patchset adds support for offloading the encap and decap of Geneve
tunnels to the NFP. In both cases, specifying well known port 6081 is a
requirement for rule offload.

Geneve firmware support has been recently added, so the patchset includes
the reading of a fw symbol that defines a bitmap of newly supported
features. Geneve will only be offloaded if the fw supports it. The new
symbol is added in fw r5646.

Geneve option fields are not supported as either a match or an action due
there current exclussion from TC flower. Because Geneve (as both a match
and action) behaves the same as other udp tunnels such as VXLAN, generic
functions are created that handle both Geneve and VXLAN. It is anticapated
that these functions will be modified to support options in future
patches.

The removal of an unused variable 'tun_dst_mask' is included as a separate
patch here. This does not affect functionality.

Also included are modifications to the test framework to check that the
new encap and decap features are functioning correctly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: compile Geneve encap actions

Generate rules for the NFP to encapsulate packets in Geneve tunnels. Move
the vxlan action code to generic udp tunnel actions and use core code for
both vxlan and Geneve.

Only support outputting to well known port 6081. Setting tunnel options
is not supported yet.

Only attempt to offload if the fw supports Geneve.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: compile Geneve match fields

Compile Geneve match fields for offloading to the NFP. The addition of
Geneve overflows the 8 bit key_layer field, so apply extended metadata to
the match cmsg allowing up to 32 more key_layer fields.

Rather than adding new Geneve blocks, move the vxlan code to generic ipv4
udp tunnel structs and use these for both vxlan and Geneve.

Matches are only supported when specifically mentioning well known port
6081. Geneve tunnel options are not yet included in the match.

Only offload Geneve if the fw supports it - include check for this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: read extra feature support from fw

Extract the _abi_flower_extra_features symbol from the fw which gives a 64
bit bitmap of new features (on top of the flower base support) that the fw
can offload. Store this bitmap in the priv data associated with each app.
If the symbol does not exist, set the bitmap to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: remove unused tun_mask variable

The tunnel dest IP is required for separate offload to the NFP. It is
already verified that a dest IP must be present and must be an exact
match in the flower rule. Therefore, we can just extract the IP from the
generated offload rule and remove the unused mask variable. The function
is then no longer required to return the IP separately.

Because tun_dst is localised to tunnel matches, move the declaration to
the tunnel if branch.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: RSS table is 4k for T6

RSS table is 4k for T6 and later cards, add check for the
same.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: properly check for empty skb array on error path

First, the check of &q->ring.queue against NULL is wrong, it
is always false. We should check the value rather than the address.

Secondly, we need the same check in pfifo_fast_reset() too,
as both ->reset() and ->destroy() are called in qdisc_destroy().

Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Include header descriptor support for ARP packets

In recent tests with new adapters, it was discovered that ARP
packets were not being properly processed. This patch adds
support for ARP packet headers to be passed to backing adapters,
if necessary.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ibmvnic-Fix-and-increase-maximum-TX-RX-queues'

Thomas Falcon says:

====================
ibmvnic: Fix and increase maximum TX/RX queues

This series renames IBMVNIC_MAX_TX_QUEUES to IBMVNIC_MAX_QUEUES since
it is used to allocate both RX and TX queues. The value is also increased
to accommodate newer hardware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Increase maximum number of RX/TX queues

Increase the number of queues allocated to accommodate recent
network adapter inclusions on the IBM vNIC platform.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Rename IBMVNIC_MAX_TX_QUEUES to IBMVNIC_MAX_QUEUES

This value denotes the maximum number of TX queues but is used
to allocate both RX and TX queues.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2017-12-18' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

The drivers/net/wireless/intel/iwlwifi/pcie/drv.c conflict was
resolved using a diff provided by Kalle in his pull request.

Kalle Valo says:

====================
wireless-drivers-next patches for 4.16

A bigger pull request this time, the most visible change being the new
driver mt76. But there's also Kconfig refactoring in ath9k and ath10k,
work beginning in iwlwifi to have rate scaling in firmware/hardware,
wcn3990 support getting closer in ath10k and lots of smaller changes.

mt76

* a new driver for MT76x2e, a 2x2 PCIe 802.11ac chipset by MediaTek

ath10k

* enable multiqueue support for all hw using mac80211 wake_tx_queue op

* new Kconfig option ATH10K_SPECTRAL to save RAM

* show tx stats on QCA9880

* new qcom,ath10k-calibration-variant DT entry

* WMI layer support for wcn3990

ath9k

* new Kconfig option ATH9K_COMMON_SPECTRAL to save RAM

wcn36xx

* hardware scan offload support

wil6210

* run-time PM support when interface is down

iwlwifi

* initial work for rate-scaling offload

* Support for new FW API version 36

* Rename the temporary hw name A000 to 22000

ssb

* make SSB a menuconfig to ease disabling it all

mwl8k

* enable non-DFS 5G channels 149-165
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Report tid start range correctly for T6

For T6, tid start range should be read from
LE_DB_ACTIVE_TABLE_START_INDEX_A register.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2017-12-18

Here's the first bluetooth-next pull request for the 4.16 kernel.

- hci_ll: multiple cleanups & fixes
- Remove Gustavo Padovan from the MAINTAINERS file
- Support BLE Adversing while connected (if the controller can do it)
- DT updates for TI chips
- Various other smaller cleanups & fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ks8851: Support DT-provided MAC address

Allow the boot loader to specify the MAC address in the device tree
to override the EEPROM, or in case no EEPROM is present.

Cc: Ben Dooks <ben@simtec.co.uk>
Cc: Tristram Ha <tristram.ha@micrel.com>
Cc: David J. Choi <david.choi@micrel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bcm63xx_enet-remove-mac_id-usage'

Jonas Gorski says:

====================
bcm63xx_enet: remove mac_id usage

This patchset aims at reducing the platform device id number usage with
the target of making it eventually possible to probe the driver through OF.

Runtested on BCM6358.

Since the patches touch mostly net/, they should go through net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bcm63xx_enet: use platform device id directly for miibus name

Directly use the platform device for generating the miibus name. This
removes the last user of bcm_enet_priv::mac_id and we can remove the
field.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcm63xx_enet: remove pointless mac_id check

Enabling the ephy clock for mac 1 is harmless, and the actual usage of
the ephy is not restricted to mac 0, so we might as well remove the
check.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcm63xx_enet: use platform data for dma channel numbers

To reduce the reliance on device ids, pass the dma channel numbers to
the enet devices as platform data.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcm63xx_enet: just use "enet" as the clock name

Now that we have the individual clocks available as "enet" we
don't need to rely on the device id for them anymore.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-speedup-vxlan-geneve-tunnel-dismantle'

Haishuang Yan says:

====================
net: speedup geneve/vxlan tunnels dismantle

This patch series add batching to vxlan/geneve tunnels so that netns
dismantles are less costly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: speedup geneve tunnels dismantle

Since we now hold RTNL lock in geneve_exit_net, it's better batch them
to speedup geneve tunnel dismantle.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: speedup vxlan tunnels dismantle

Since we now hold RTNL lock in vxlan_exit_net, it's better to batch them
to speedup vxlan tunnels dismantle.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: remove duplicate structure member in xmit

Since both first_tx_ctx and tx_skb are the head of tx ctx, it not
necessary to use two structure members to statically indicate
the head of tx ctx. So first_tx_ctx is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-NETIF_F_GRO_HW'

Michael Chan says:

====================
Introduce NETIF_F_GRO_HW

Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
hardware GRO to use the new flag.

v5:
- Documentation changes requested by Alexander Duyck.
- bnx2x changes requested by Manish Chopra to enable LRO by default, and
disable GRO_HW if disable_tpa module parameter is set.

v4:
- more changes requested by Alexander Duyck:
- check GRO_HW/GRO dependency in drivers's ndo_fix_features().
- Reverse the order of RXCSUM and GRO_HW dependency check in
netdev_fix_features().
- No propagation in netdev_disable_gro_hw().

v3:
- Let driver's ndo_fix_features() disable NETIF_F_LRO when NETIF_F_GRO_HW
is set instead of doing it in common netdev_fix_features().

v2:
- NETIF_F_GRO_HW flag propagation between upper and lower devices not
required (see patch 1).
- NETIF_F_GRO_HW depends on NETIF_F_GRO and NETIF_F_RXCSUM.
- Add dev_disable_gro_hw() to disable GRO_HW for generic XDP.
- Use ndo_fix_features() on all 3 drivers to drop GRO_HW when it is not
supported
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Use NETIF_F_GRO_HW.

Advertise NETIF_F_GRO_HW and set edev->gro_disable according to the
feature flag. Add qede_fix_features() to drop NETIF_F_GRO_HW if
XDP is running or MTU does not support GRO_HW or GRO is not set.
qede_change_mtu() also checks and disables GRO_HW if MTU is not
supported.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Acked-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Use NETIF_F_GRO_HW.

Advertise NETIF_F_GRO_HW and turn on TPA_MODE_GRO when NETIF_F_GRO_HW
is set.  Disable NETIF_F_GRO_HW in bnx2x_fix_features() if the MTU
does not support TPA_MODE_GRO or GRO is not set.  bnx2x_change_mtu() also
needs to disable NETIF_F_GRO_HW if the MTU does not support it.

Original parameter disable_tpa will continue to disable LRO and GRO_HW.

Preserve the original behavior of enabling LRO by default.  User has
to run ethtool -K to explicitly enable GRO_HW.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Use NETIF_F_GRO_HW.

Advertise NETIF_F_GRO_HW in hw_features if hardware GRO is supported.
In bnxt_fix_features(), disable GRO_HW and LRO if current hardware
configuration does not allow it.  GRO_HW depends on GRO.  GRO_HW is
also mutually exclusive with LRO.  XDP setup will now rely on
bnxt_fix_features() to turn off aggregation.  During chip init, turn on
or off hardware GRO based on NETIF_F_GRO_HW in features flag.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Disable GRO_HW when generic XDP is installed on a device.

Hardware should not aggregate any packets when generic XDP is installed.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Introduce NETIF_F_GRO_HW.

Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
GRO.  With this flag, we can now independently turn on or off hardware
GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
control hardware GRO and so it cannot be independently turned on or
off without affecting GRO.

Hardware GRO (just like GRO) guarantees that packets can be re-segmented
by TSO/GSO to reconstruct the original packet stream.  Logically,
GRO_HW should depend on GRO since it a subset, but we will let
individual drivers enforce this dependency as they see fit.

Since NETIF_F_GRO is not propagated between upper and lower devices,
NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
words, a lower device can independent have GRO/GRO_HW enabled or disabled
and no feature propagation is required.  This will preserve the current
GRO behavior.  This can be changed later if we decide to propagate GRO/
GRO_HW/RXCSUM from upper to lower devices.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Hide unused variable when !CONFIG_PROC_FS.

When CONFIG_PROC_FS is disabled, we will not use the prot_inuse
counter. This adds an #ifdef to hide the variable definition in
that case. This is not a bugfix. But we can save bytes when there
are many network namespace.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Move the socket inuse to namespace.

In some case, we want to know how many sockets are in use in
different _net_ namespaces. It's a key resource metric.

This patch add a member in struct netns_core. This is a counter
for socket-inuse in the _net_ namespace. The patch will add/sub
counter in the sk_alloc, sk_clone_lock and __sk_free.

This patch will not counter the socket created in kernel.
It's not very useful for userspace to know how many kernel
sockets we created.

The main reasons for doing this are that:

1. When linux calls the 'do_exit' for process to exit, the functions
'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
'exit_task_namespaces' may have destroyed the _net_ namespace, but
'sock_release' called in 'exit_task_work' may use the _net_ namespace
if we counter the socket-inuse in sock_release.

2. socket and sock are in pair. More important, sock holds the _net_
namespace. We counter the socket-inuse in sock, for avoiding holding
_net_ namespace again in socket. It's a easy way to maintain the code.

Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: Change the netns_core member name.

Change the member name will make the code more readable.
This patch will be used in next patch.

Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Simplify PCIe Completion Timeout setting

Simplify PCIe Completion Timeout setting by using the
pcie_capability_clear_and_set_word() interface. No functional change
intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'erspan-a-couple-fixes'

William Tu says:

====================
net: erspan: a couple fixes

Haishuang Yan reports a couple of issues (wrong return value,
pskb_may_pull) on erspan V1. Since erspan V2 is in net-next,
this series fix the similar issues on v2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: erspan: reload pointer after pskb_may_pull

pskb_may_pull() can change skb->data, so we need to re-load pkt_md
and ershdr at the right place.

Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support")
Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre")
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: erspan: fix wrong return value

If pskb_may_pull return failed, return PACKET_REJECT
instead of -ENOMEM.

Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support")
Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre")
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfp-phylink-fixes'

Russell King says:

====================
More SFP/phylink fixes

This series fixes a few more bits with sfp/phylink, particularly
confusion with the right way to test for the RTNL mutex being
held, a change in 2016 to the mdiobus_scan() behaviour that wasn't
noticed, and a fix for reading module EEPROMs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: fix locking asserts

Use ASSERT_RTNL() rather than WARN_ON(!lockdep_rtnl_is_held()) which
stops working when lockdep fires, and we end up with lots of warnings.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: fix EEPROM reading in the case of non-SFF8472 SFPs

The EEPROM reading was trying to read from the second EEPROM address
if we requested the last byte from the SFF8079 EEPROM, which caused a
failure when the second EEPROM is not present. Discovered with a
S-RJ01 SFP module. Fix this.

Fixes: 73970055450e ("sfp: add SFP module support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: fix non-detection of PHY

The detection of a PHY changed in commit e98a3aabf85f ("mdio_bus: don't
return NULL from mdiobus_scan()") which now causes sfp to print an
error message. Update for this change.

Fixes: 73970055450e ("sfp: add SFP module support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ncsi: Don't take any action on HNCDSC AEN

The current HNCDSC handler takes the status flag from the AEN packet and
will update or change the current channel based on this flag and the
current channel status.

However the flag from the HNCDSC packet merely represents the host link
state. While the state of the host interface is potentially interesting
information it should not affect the state of the NCSI link. Indeed the
NCSI specification makes no mention of any recommended action related to
the host network controller driver state.

Update the HNCDSC handler to record the host network driver status but
take no other action.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-meson-gxl-clean-up-and-improvements'

Jerome Brunet says:

====================
net: phy: meson-gxl: clean-up and improvements

This patchset adds defines for the control registers and helpers to access
the banked registers. The goal being to make it easier to understand what
the driver actually does.
Then CONFIG_A6 settings is removed since this statement was without effect
Finally interrupt support is added, speeding things up a little

This series has been tested on the libretech-cc and khadas VIM

Changes since v2 [0]:
Drop LPA corruption fix which has been merged through net. Apart from this,
series remains the same.

[0]: https://lkml.kernel.org/r/20171207142715.32578-1-jbrunet@baylibre.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: join the authors

Following previous changes, join the other authors of this driver and
take the blame with them

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: add interrupt support

Enable interrupt support in meson-gxl PHY driver

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: leave CONFIG_A6 untouched

The PHY performs just as well when left in its default configuration and
it makes senses because this poke gets reset just after init.

According to the documentation, all registers in the Analog/DSP bank are
reset when there is a mode switch from 10BT to 100BT. The bank is also
reset on power down and soft reset, so we will never see the value which
may have been set by the bootloader.

In the end, we have used the default configuration so far and there is no
reason to change now. Remove CONFIG_A6 poke to make this clear.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: use genphy_config_init

Use the generic init function to populate some of the phydev
structure fields

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: add read and write helpers for banked registers

Add read and write helpers to manipulate banked registers on this PHY
This helps clarify the settings applied to these registers and what the
driver actually does

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: define control registers

Define registers and bits in meson-gxl PHY driver to make a bit
more human friendly. No functional change.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: check phy_write return value

Always check phy_write return values. Better to be safe than sorry

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-Medford2'

Edward Cree says:

====================
sfc: Initial X2000-series (Medford2) support

Basic PCI-level changes to support X2000-series NICs.
Also fix unexpected-PTP-event log messages, since the timestamp format has
been changed in these NICs and that causes us to fail to probe PTP (but we
still get the PPS events).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: populate the timer reload field

The timer mode register now has a separate field for the reload value.
Since we always use this timer with the reload (for interrupt moderation)
we set this to the same as the initial value.

Previous hardware ignores this field, so we can safely set these bits
on all hardware that uses this register.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: update EF10 register definitions

The RX_L4_CLASS field has shrunk from 3 bits to 2 bits. The upper
bit was never used in previous hardware, so we can use the new
definition throughout.

The TSO OUTER_IPID field was previously spelt differently from the
external definitions.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: improve PTP error reporting

Log a message if PTP probing fails; if we then, unexpectedly, get PTP
events, only log a message for the first one on each device.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add Medford2 (SFC9250) PCI Device IDs

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: support VI strides other than 8k

Medford2 can also have 16k or 64k VI stride. This is reported by MCDI in
GET_CAPABILITIES, which fortunately is called before the driver does
anything sensitive to the VI stride (such as accessing or even allocating
VIs past the zeroth).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: make mem_bar a function rather than a constant

Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
devices yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-18

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Allow arbitrary function calls from one BPF function to another BPF function.
   As of today when writing BPF programs, __always_inline had to be used in
   the BPF C programs for all functions, unnecessarily causing LLVM to inflate
   code size. Handle this more naturally with support for BPF to BPF calls
   such that this __always_inline restriction can be overcome. As a result,
   it allows for better optimized code and finally enables to introduce core
   BPF libraries in the future that can be reused out of different projects.
   x86 and arm64 JIT support was added as well, from Alexei.

2) Add infrastructure for tagging functions as error injectable and allow for
   BPF to return arbitrary error values when BPF is attached via kprobes on
   those. This way of injecting errors generically eases testing and debugging
   without having to recompile or restart the kernel. Tags for opting-in for
   this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.

3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
   call for XDP programs. First part of this work adds handling of BPF
   capabilities included in the firmware, and the later patches add support
   to the nfp verifier part and JIT as well as some small optimizations,
   from Jakub.

4) The bpftool now also gets support for basic cgroup BPF operations such
   as attaching, detaching and listing current BPF programs. As a requirement
   for the attach part, bpftool can now also load object files through
   'bpftool prog load'. This reuses libbpf which we have in the kernel tree
   as well. bpftool-cgroup man page is added along with it, from Roman.

5) Back then commit e87c6bc3852b ("bpf: permit multiple bpf attachments for
   a single perf event") added support for attaching multiple BPF programs
   to a single perf event. Given they are configured through perf's ioctl()
   interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
   command in this work in order to return an array of one or multiple BPF
   prog ids that are currently attached, from Yonghong.

6) Various minor fixes and cleanups to the bpftool's Makefile as well
   as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
   itself or prior installed documentation related to it, from Quentin.

7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
   required for the test_dev_cgroup test case to run, from Naresh.

8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.

9) Fix libbpf's exit code from the Makefile when libelf was not found in
   the system, also from Jakub.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

trace: reenable preemption if we modify the ip

Things got moved around between the original bpf_override_return patches
and the final version, and now the ftrace kprobe dispatcher assumes if
you modified the ip that you also enabled preemption. Make a comment of
this and enable preemption, this fixes the lockdep splat that happened
when using this feature.

Fixes: 9802d86585db ("bpf: add a bpf_override_function helper")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

nfp: set flags in the correct member of netdev_bpf

netdev_bpf.flags is the input member for installing the program.
netdev_bpf.prog_flags is the output member for querying. Set
the correct one on query.

Fixes: 92f0292b35a0 ("net: xdp: report flags program was installed with on query")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: fix Makefile exit code if libelf not found

/bin/sh's exit does not recognize -1 as a number, leading to
the following error message:

/bin/sh: 1: exit: Illegal number: -1

Use 1 as the exit code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-to-bpf-function-calls'

Alexei Starovoitov says:

====================
First of all huge thank you to Daniel, John, Jakub, Edward and others who
reviewed multiple iterations of this patch set over the last many months
and to Dave and others who gave critical feedback during netconf/netdev.

The patch is solid enough and we thought through numerous corner cases,
but it's not the end. More followups with code reorg and features to follow.

TLDR: Allow arbitrary function calls from bpf function to another bpf function.

Since the beginning of bpf all bpf programs were represented as a single function
and program authors were forced to use always_inline for all functions
in their C code. That was causing llvm to unnecessary inflate the code size
and forcing developers to move code to header files with little code reuse.

With a bit of additional complexity teach verifier to recognize
arbitrary function calls from one bpf function to another as long as
all of functions are presented to the verifier as a single bpf program.
Extended program layout:
..
r1 = ..    // arg1
r2 = ..    // arg2
call pc+1  // function call pc-relative
exit
.. = r1    // access arg1
.. = r2    // access arg2
..
call pc+20 // second level of function call
...

It allows for better optimized code and finally allows to introduce
the core bpf libraries that can be reused in different projects,
since programs are no longer limited by single elf file.
With function calls bpf can be compiled into multiple .o files.

This patch is the first step. It detects programs that contain
multiple functions and checks that calls between them are valid.
It splits the sequence of bpf instructions (one program) into a set
of bpf functions that call each other. Calls to only known
functions are allowed. Since all functions are presented to
the verifier at once conceptually it is 'static linking'.

Future plans:
- introduce BPF_PROG_TYPE_LIBRARY and allow a set of bpf functions
  to be loaded into the kernel that can be later linked to other
  programs with concrete program types. Aka 'dynamic linking'.

- introduce function pointer type and indirect calls to allow
  bpf functions call other dynamically loaded bpf functions while
  the caller bpf function is already executing. Aka 'runtime linking'.
  This will be more generic and more flexible alternative
  to bpf_tail_calls.

FAQ:
Q: Interpreter and JIT changes mean that new instruction is introduced ?
A: No. The call instruction technically stays the same. Now it can call
   both kernel helpers and other bpf functions.
   Calling convention stays the same as well.
   From uapi point of view the call insn got new 'relocation' BPF_PSEUDO_CALL
   similar to BPF_PSEUDO_MAP_FD 'relocation' of bpf_ldimm64 insn.

Q: What had to change on LLVM side?
A: Trivial LLVM patch to allow calls was applied to upcoming 6.0 release:
   https://reviews.llvm.org/rL318614
   with few bugfixes as well.
   Make sure to build the latest llvm to have bpf_call support.

More details in the patches.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: additional bpf_call tests

Add some additional checks for few more corner cases.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: arm64: add JIT support for multi-function programs

similar to x64 add support for bpf-to-bpf calls.
When program has calls to in-kernel helpers the target call offset
is known at JIT time and arm64 architecture needs 2 passes.
With bpf-to-bpf calls the dynamically allocated function start
is unknown until all functions of the program are JITed.
Therefore (just like x64) arm64 JIT needs one extra pass over
the program to emit correct call offsets.

Implementation detail:
Avoid being too clever in 64-bit immediate moves and
always use 4 instructions (instead of 3-4 depending on the address)
to make sure only one extra pass is needed.
If some future optimization would make it worth while to optimize
'call 64-bit imm' further, the JIT would need to do 4 passes
over the program instead of 3 as in this patch.
For typical bpf program address the mov needs 3 or 4 insns,
so unconditional 4 insns to save extra pass is a worthy trade off
at this state of JIT.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: x64: add JIT support for multi-function programs

Typical JIT does several passes over bpf instructions to
compute total size and relative offsets of jumps and calls.
With multitple bpf functions calling each other all relative calls
will have invalid offsets intially therefore we need to additional
last pass over the program to emit calls with correct offsets.
For example in case of three bpf functions:
main:
  call foo
  call bpf_map_lookup
  exit
foo:
  call bar
  exit
bar:
  exit

We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
x64 JIT typically does 4-5 passes to converge.
After these initial passes the image for these 3 functions
will be good except call targets, since start addresses of
foo() and bar() are unknown when we were JITing main()
(note that call bpf_map_lookup will be resolved properly
during initial passes).
Once start addresses of 3 functions are known we patch
call_insn->imm to point to right functions and call
bpf_int_jit_compile() again which needs only one pass.
Additional safety checks are done to make sure this
last pass doesn't produce image that is larger or smaller
than previous pass.

When constant blinding is on it's applied to all functions
at the first pass, since doing it once again at the last
pass can change size of the JITed code.

Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
All other JITs that support normal BPF_CALL will behave the same way
since bpf-to-bpf call is equivalent to bpf-to-kernel call from
JITs point of view.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: fix net.core.bpf_jit_enable race

global bpf_jit_enable variable is tested multiple times in JITs,
blinding and verifier core. The malicious root can try to toggle
it while loading the programs. This race condition was accounted
for and there should be no issues, but it's safer to avoid
this race condition.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: add support for bpf_call to interpreter

though bpf_call is still the same call instruction and
calling convention 'bpf to bpf' and 'bpf to helper' is the same
the interpreter has to oparate on 'struct bpf_insn *'.
To distinguish these two cases add a kernel internal opcode and
mark call insns with it.
This opcode is seen by interpreter only. JITs will never see it.
Also add tiny bit of debug code to aid interpreter debugging.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add xdp noinline test

add large semi-artificial XDP test with 18 functions to stress test
bpf call verification logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add bpf_call test

strip always_inline from test_l4lb.c and compile it with -fno-inline
to let verifier go through 11 function with various function arguments
and return values

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: add support for bpf_call

- recognize relocation emitted by llvm
- since all regular function will be kept in .text section and llvm
  takes care of pc-relative offsets in bpf_call instruction
  simply copy all of .text to relevant program section while adjusting
  bpf_call instructions in program section to point to newly copied
  body of instructions from .text
- do so for all programs in the elf file
- set all programs types to the one passed to bpf_prog_load()

Note for elf files with multiple programs that use different
functions in .text section we need to do 'linker' style logic.
This work is still TBD

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: add tests for stack_zero tracking

adjust two tests, since verifier got smarter
and add new one to test stack_zero logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: teach verifier to recognize zero initialized stack

programs with function calls are often passing various
pointers via stack. When all calls are inlined llvm
flattens stack accesses and optimizes away extra branches.
When functions are not inlined it becomes the job of
the verifier to recognize zero initialized stack to avoid
exploring paths that program will not take.
The following program would fail otherwise:

ptr = &buffer_on_stack;
*ptr = 0;
...
func_call(.., ptr, ...) {
  if (..)
    *ptr = bpf_map_lookup();
}
...
if (*ptr != 0) {
  // Access (*ptr)->field is valid.
  // Without stack_zero tracking such (*ptr)->field access
  // will be rejected
}

since stack slots are no longer uniform invalid | spill | misc
add liveness marking to all slots, but do it in 8 byte chunks.
So if nothing was read or written in [fp-16, fp-9] range
it will be marked as LIVE_NONE.
If any byte in that range was read, it will be marked LIVE_READ
and stacksafe() check will perform byte-by-byte verification.
If all bytes in the range were written the slot will be
marked as LIVE_WRITTEN.
This significantly speeds up state equality comparison
and reduces total number of states processed.

                    before   after
bpf_lb-DLB_L3.o       2051    2003
bpf_lb-DLB_L4.o       3287    3164
bpf_lb-DUNKNOWN.o     1080    1080
bpf_lxc-DDROP_ALL.o   24980   12361
bpf_lxc-DUNKNOWN.o    34308   16605
bpf_netdev.o          15404   10962
bpf_overlay.o         7191    6679

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>