git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge branch 'rtnetlink-Add-support-for-rigid-checking-of-data-in-dump-request'

David Ahern says:

====================
rtnetlink: Add support for rigid checking of data in dump request

There are many use cases where a user wants to influence what is
returned in a dump for some rtnetlink command: one is wanting data
for a different namespace than the one the request is received and
another is limiting the amount of data returned in the dump to a
specific set of interest to userspace, reducing the cpu overhead of
both kernel and userspace. Unfortunately, the kernel has historically
not been strict with checking for the proper header or checking the
values passed in the header. This lenient implementation has allowed
iproute2 and other packages to pass any struct or data in the dump
request as long as the family is the first byte. For example, ifinfomsg
struct is used by iproute2 for all generic dump requests - links,
addresses, routes and rules when it is really only valid for link
requests.

There is 1 is example where the kernel deals with the wrong struct: link
dumps after VF support was added. Older iproute2 was sending rtgenmsg as
the header instead of ifinfomsg so a patch was added to try and detect
old userspace vs new:
e5eca6d41f53 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0")

The latest example is Christian's patch set wanting to return addresses for
a target namespace. It guesses the header struct is an ifaddrmsg and if it
guesses wrong a netlink warning is generated in the kernel log on every
address dump which is unacceptable.

Another example where the kernel is a bit lenient is route dumps: iproute2
can send either a request with either ifinfomsg or a rtmsg as the header
struct, yet the kernel always treats the header as an rtmsg (see
inet_dump_fib and rtm_flags check). The header inconsistency impacts the
ability to add kernel side filters for route dumps - a necessary feature
for scale setups with 100k+ routes.

How to resolve the problem of not breaking old userspace yet be able to
move forward with new features such as kernel side filtering which are
crucial for efficient operation at high scale?

This patch set addresses the problem by adding a new socket flag,
NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to
request strict checking of headers and attributes on dump requests and
hence unlock the ability to use kernel side filters as they are added.

Kernel side, the dump handlers are updated to verify the message contains
at least the expected header struct:
    RTM_GETLINK:       ifinfomsg
    RTM_GETADDR:       ifaddrmsg
    RTM_GETMULTICAST:  ifaddrmsg
    RTM_GETANYCAST:    ifaddrmsg
    RTM_GETADDRLABEL:  ifaddrlblmsg
    RTM_GETROUTE:      rtmsg
    RTM_GETSTATS:      if_stats_msg
    RTM_GETNEIGH:      ndmsg
    RTM_GETNEIGHTBL:   ndtmsg
    RTM_GETNSID:       rtgenmsg
    RTM_GETRULE:       fib_rule_hdr
    RTM_GETNETCONF:    netconfmsg
    RTM_GETMDB:        br_port_msg

And then every field in the header struct should be 0 with the exception
of the family. There are a few exceptions to this rule where the kernel
already influences the data returned by values in the struct. Next the
message should not contain attributes unless the kernel implements
filtering for it. Any unexpected data causes the dump to fail with EINVAL.
If the new flag is honored by the kernel and the dump contents adjusted
by any data passed in the request, the dump handler can set the
NLM_F_DUMP_FILTERED flag in the netlink message header.

For old userspace on new kernel there is no impact as all checks are
wrapped in a check on the new strict flag. For new userspace on old
kernel, the data in the headers and any appended attributes are
silently ignored though the setsockopt failing is the clue to userspace
the feature is not supported. New userspace on new kernel gets the
requested data dump.

iproute2 patches can be found here:
    https://github.com/dsahern/iproute2 dump-enhancements

Major changes since v1
- inner header is supposed to be 4-bytes aligned. So for dumps that
  should not have attributes appended changed the check to use:
        if (nlmsg_attrlen(nlh, sizeof(hdr)))
  Only impacts patches with headers that are not multiples of 4-bytes
  (rtgenmsg, netconfmsg), but applied the change to all patches not
  calling nlmsg_parse for consistency.

- Added nlmsg_parse_strict and nla_parse_strict for tighter control on
  attribute parsing. There should be no unknown attribute types or extra
  bytes.

- Moved validation to a helper in most cases

Changes since rfc-v2
- dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per
  Jiri's objections
- changed the opt-in uapi from a netlink message flag to a socket
  flag. setsockopt provides an api for userspace to definitively
  know if the kernel supports strict checking on dumps.
- re-ordered patches to peel off the extack on dumps if needed to
  keep this set size within limits
- misc cleanups in patches based on testing
====================

Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update rtnl_fdb_dump for strict data checking

Update rtnl_fdb_dump for strict data checking. If the flag is set,
the dump request is expected to have an ndmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the NDA_IFINDEX and
NDA_MASTER attributes are supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Move input checking for rtnl_fdb_dump to helper

Move the existing input checking for rtnl_fdb_dump into a helper,
valid_fdb_dump_legacy. This function will retain the current
logic that works around the 2 headers that userspace has been
allowed to send up to this point.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/bridge: Update br_mdb_dump for strict data checking

Update br_mdb_dump for strict data checking. If the flag is set,
the dump request is expected to have a br_port_msg struct as the
header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Update netconf dump handlers for strict data checking

Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and
mpls_netconf_dump_devconf for strict data checking. If the flag is set,
the dump request is expected to have an netconfmsg struct as the header.
The struct only has the family member and no attributes can be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Update ip6addrlbl_dump for strict data checking

Update ip6addrlbl_dump for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrlblmsg struct as the
header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fib_rules: Update fib_nl_dumprule for strict data checking

Update fib_nl_dumprule for strict data checking. If the flag is set,
the dump request is expected to have fib_rule_hdr struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/namespace: Update rtnl_net_dumpid for strict data checking

Update rtnl_net_dumpid for strict data checking. If the flag is set,
the dump request is expected to have an rtgenmsg struct as the header
which has the family as the only element. No data may be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/neighbor: Update neightbl_dump_info for strict data checking

Update neightbl_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndtmsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/neighbor: Update neigh_dump_info for strict data checking

Update neigh_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the NDA_IFINDEX and
NDA_MASTER attributes are supported.

Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update fib dumps for strict data checking

Add helper to check netlink message for route dumps. If the strict flag
is set the dump request is expected to have an rtmsg struct as the header.
All elements of the struct are expected to be 0 with the exception of
rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
set.

Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
and ip6mr_rtm_dumproute to call this helper if strict data checking is
enabled.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update ipmr_rtm_dumplink for strict data checking

Update ipmr_rtm_dumplink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update inet6_dump_ifinfo for strict data checking

Update inet6_dump_ifinfo for strict data checking. If the flag is
set, the dump request is expected to have an ifinfomsg struct as
the header. All elements of the struct are expected to be 0 and no
attributes can be appended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update rtnl_stats_dump for strict data checking

Update rtnl_stats_dump for strict data checking. If the flag is set,
the dump request is expected to have an if_stats_msg struct as the header.
All elements of the struct are expected to be 0 except filter_mask which
must be non-0 (legacy behavior). No attributes are supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update rtnl_bridge_getlink for strict data checking

Update rtnl_bridge_getlink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFLA_EXT_MASK
attribute is supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Update rtnl_dump_ifinfo for strict data checking

Update rtnl_dump_ifinfo for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID,
IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported.

Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Update inet6_dump_addr for strict data checking

Update inet6_dump_addr for strict data checking. If the flag is set, the
dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values suppored by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can add support for other fields
(e.g., honor ifa_index and only return data for the given device index).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv4: Update inet_dump_ifaddr for strict data checking

Update inet_dump_ifaddr for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can support for other fields
(e.g., honor ifa_index and only return data for the given device index).

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Add new socket option to enable strict checking on dumps

Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
can use via setsockopt to request strict checking of headers and
attributes on dump requests.

To get dump features such as kernel side filtering based on data in
the header or attributes appended to the dump request, userspace
must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
value. Since the netlink sock and its flags are private to the
af_netlink code, the strict checking flag is passed to dump handlers
via a flag in the netlink_callback struct.

For old userspace on new kernel there is no impact as all of the data
checks in later patches are wrapped in a check on the new strict flag.

For new userspace on old kernel, the setsockopt will fail and even if
new userspace sets data in the headers and appended attributes the
kernel will silently ignore it. Moving forward when the setsockopt
succeeds, the new userspace on old kernel means the dump request can
pass an attribute the kernel does not understand. The dump will then
fail as the older kernel does not understand it.

New userspace on new kernel setting the socket option gets the benefit
of the improved data dump.

Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
checking on the NEW, DEL, and SET commands.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrs

Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid
into it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Add strict version of nlmsg_parse and nla_parse

nla_parse is currently lenient on message parsing, allowing type to be 0
or greater than max expected and only logging a message

"netlink: %d bytes leftover after parsing attributes in process `%s'."

if the netlink message has unknown data at the end after parsing. What this
could mean is that the header at the front of the attributes is actually
wrong and the parsing is shifted from what is expected.

Add a new strict version that actually fails with EINVAL if there are any
bytes remaining after the parsing loop completes, if the atttrbitue type
is 0 or greater than max expected.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add extack to nlmsg_parse

Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Add extack message to nlmsg_parse for invalid header length

Give a user a reason why EINVAL is returned in nlmsg_parse.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Pass extack to dump handlers

Declare extack in netlink_dump and pass to dump handlers via
netlink_callback. Add any extack message after the dump_done_errno
allowing error messages to be returned. This will be useful when
strict checking is done on dump requests, returning why the dump
fails EINVAL.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: mark expected switch fall-throughs

Replace "--v-- fall through --v--" with a proper "fall through"
annotation. Also, change "bad cid: fall through" to
"fall through - bad cid".

This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-cls_u32-Various-improvements'

Jamal Hadi Salim says:

====================
net: sched: cls_u32 Various improvements

Various improvements from Al.

Changes from version 1: Add missing commit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: simplify the hell out u32_delete() emptiness check

Now that we have the knode count, we can instantly check if
any hnodes are non-empty. And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: keep track of knodes count in tc_u_common

allows to simplify u32_delete() considerably

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: get rid of tp_c

Both hnode ->tp_c and tp_c argument of u32_set_parms()
the latter is redundant, the former - never read...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data

It must be tc_u_common associated with that tp (i.e. tp->data).
Proof:
* both ->ht_up and ->tp_c are assign-once
* ->tp_c of anything inserted into tp_c->hlist is tp_c
* hnodes never get reinserted into the lists or moved
between those, so anything found by u32_lookup_ht(tp->data, ...)
will have ->tp_c equal to tp->data.
* tp->root->tp_c == tp->data.
* ->ht_up of anything inserted into hnode->ht[...] is
equal to hnode.
* knodes never get reinserted into hash chains or moved
between those, so anything returned by u32_lookup_key(ht, ...)
will have ->ht_up equal to ht.
* any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c
point to tp->data

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode

the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: clean tc_u_common hashtable

* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: get rid of tc_u_common ->rcu

unused

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: get rid of tc_u_knode ->tp

not used anymore

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: get rid of unused argument of u32_destroy_key()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: make sure that divisor is a power of 2

Tested by modifying iproute2 to allow sending a divisor > 255

Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: disallow linking to root hnode

Operation makes no sense. Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.

NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.

Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
should fail with
Error: cls_u32: Not linking to root node

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: cls_u32: mark root hnode explicitly

... and produce consistent error on attempt to delete such.
Existing check in u32_delete() is inconsistent - after

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \
divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \
divisor 1

both

tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32

and

tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32

will fail (at least with refcounting fixes), but the former will complain
about an attempt to remove a busy table, while the latter will recognize
it as root and yield "Not allowed to delete root node" instead.

The problem with the existing check is that several tcf_proto instances
might share the same tp->data and handle-to-hnode lookup will be the same
for all of them. So comparing an hnode to be deleted with tp->root won't
catch the case when one tp is used to try deleting the root of another.
Solution is trivial - mark the root hnodes explicitly upon allocation and
check for that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-mscc-add-support-for-VSC8584-and-VSC8574-Microsemi-quad-port-PHYs'

Quentin Schulz says:

====================
net: phy: mscc: add support for VSC8584 and VSC8574 Microsemi quad-port PHYs

RESEND: rebased on top of latest net-next and on top of latest version of
"net: phy: mscc: various improvements to Microsemi PHY driver" patch
series.

Both PHYs are 4-port PHY that are 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X
and triple-speed copper SFP capable, can communicate with the MAC via
SGMII, QSGMII or 1000BASE-X, supports downshifting and can set the blinking
pattern of each of its 4 LEDs, supports SyncE as well as HP Auto-MDIX
detection.

VSC8574 supports WOL and VSC8584 supports hardware offloading of MACsec.

This patch series add support for 10/100/1000BASE-T, SGMII/QSGMII link with
the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for
their 4 LEDs.

They have also an internal Intel 8051 microcontroller whose firmware needs
to be patched when the PHY is reset. If the 8051's firmware has the
expected CRC, its patching can be skipped. The microcontroller can be
accessed from any port of the PHY, though the CRC function can only be done
through the PHY that is the base PHY of the package (internal address 0)
due to a limitation of the firmware.

The GPIO register bank is a set of registers that are common to all PHYs in
the package. So any modification in any register of this bank affects all
PHYs of the package.

If the PHYs haven't been reset before booting the Linux kernel and were
configured to use interrupts for e.g. link status updates, it is required
to clear the interrupts mask register of all PHYs before being able to use
interrupts with any PHY. The first PHY of the package that will be init
will take care of clearing all PHYs interrupts mask registers. Thus, we
need to keep track of the init sequence in the package, if it's already
been done or if it's to be done.

Most of the init sequence of a PHY of the package is common to all PHYs in
the package, thus we use the SMI broadcast feature which enables us to
propagate a write in one register of one PHY to all PHYs in the same
package.

We also introduce a new development board called PCB120 which exists in
variants for VSC8584 and VSC8574 (and that's the only difference to the
best of my knowledge).

I suggest patches 1 to 3 go through net tree and patches 4 and 5 go
through MIPS tree. Patches going through net tree and those going through
MIPS tree do not depend on one another.

This patch series depends on this patch series:
(https://lore.kernel.org/lkml/20181008100728.24959-1-quentin.schulz@bootlin.com/)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: add support for VSC8574 PHY

The VSC8574 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
1000BASE-X and triple-speed copper SFP capable, can communicate with
the MAC via SGMII, QSGMII or 1000BASE-X, supports WOL, downshifting and
can set the blinking pattern of each of its 4 LEDs, supports SyncE as
well as HP Auto-MDIX detection.

This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC,
WOL, downshifting, HP Auto-MDIX detection and blinking pattern for its 4
LEDs.

The VSC8574 has also an internal Intel 8051 microcontroller whose
firmware needs to be patched when the PHY is reset. If the 8051's
firmware has the expected CRC, its patching can be skipped. The
microcontroller can be accessed from any port of the PHY, though the CRC
function can only be done through the PHY that is the base PHY of the
package (internal address 0) due to a limitation of the firmware.

The GPIO register bank is a set of registers that are common to all PHYs
in the package. So any modification in any register of this bank affects
all PHYs of the package.

If the PHYs haven't been reset before booting the Linux kernel and were
configured to use interrupts for e.g. link status updates, it is
required to clear the interrupts mask register of all PHYs before being
able to use interrupts with any PHY. The first PHY of the package that
will be init will take care of clearing all PHYs interrupts mask
registers. Thus, we need to keep track of the init sequence in the
package, if it's already been done or if it's to be done.

Most of the init sequence of a PHY of the package is common to all PHYs
in the package, thus we use the SMI broadcast feature which enables us
to propagate a write in one register of one PHY to all PHYs in the same
package.

Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: add support for VSC8584 PHY

The VSC8584 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX,
1000BASE-X and triple-speed copper SFP capable, can communicate with the
MAC via SGMII, QSGMII or 1000BASE-X, supports downshifting and can set
the blinking pattern of each of its 4 LEDs, supports hardware offloading
of MACsec and supports SyncE as well as HP Auto-MDIX detection.

This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC,
downshifting, HP Auto-MDIX detection and blinking pattern for its 4
LEDs.

The VSC8584 has also an internal Intel 8051 microcontroller whose
firmware needs to be patched when the PHY is reset. If the 8051's
firmware has the expected CRC, its patching can be skipped. The
microcontroller can be accessed from any port of the PHY, though the CRC
function can only be done through the PHY that is the base PHY of the
package (internal address 0) due to a limitation of the firmware.

The GPIO register bank is a set of registers that are common to all PHYs
in the package. So any modification in any register of this bank affects
all PHYs of the package.

If the PHYs haven't been reset before booting the Linux kernel and were
configured to use interrupts for e.g. link status updates, it is
required to clear the interrupts mask register of all PHYs before being
able to use interrupts with any PHY. The first PHY of the package that
will be init will take care of clearing all PHYs interrupts mask
registers. Thus, we need to keep track of the init sequence in the
package, if it's already been done or if it's to be done.

Most of the init sequence of a PHY of the package is common to all PHYs
in the package, thus we use the SMI broadcast feature which enables us
to propagate a write in one register of one PHY to all PHYs in the same
package.

The revA of the VSC8584 PHY (which is not and will not be publicly
released) should NOT patch the firmware of the microcontroller or it'll
make things worse, the easiest way is just to not support it.

Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: vsc8531: add two additional LED modes for VSC8584

The VSC8584 (and most likely other PHYs in the same generation) has two
additional LED modes that can be picked, so let's add them.

Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-phy-mscc-various-improvements-to-Microsemi-PHY-driver'

Quentin Schulz says:

====================
net: phy: mscc: various improvements to Microsemi PHY driver

The Microsemi PHYs have multiple banks of registers (called pages).
Registers can only be accessed from one page, if we need a register from
another page, we need to switch the page and the registers of all other
pages are not accessible anymore.

Basically, to read register 5 from page 0, 1, 2, etc., you do the same
phy_read(phydev, 5); but you need to set the desired page beforehand.

In order to guarantee that two concurrent functions do not change the
page, we need to do some locking per page. This can be achieved with the
use of phy_select_page and phy_restore_page functions but phy_write/read
calls in-between those two functions shall be replaced by their
lock-free alternative __phy_write/read.

The Microsemi PHYs have several counters so let's make them available as PHY
statistics.

The VSC 8530/31/40/41 also need to update their EEE init sequence in order to
avoid packet losses and improve performance.

This patch series also makes some minor cosmetic changes to the driver.

v3:
  - add reviewed-by,
  - use phy_read/write/modify_paged whenever possible instead of the
  combo phy_select_page, __phy_read/write/modify, phy_restore_page when
  only one __phy_read/write/modify was executed,

v2:
  - add patch to migrate MSCC driver to use phy_restore/select_page,
  - migrate all patches from v1 to use those two functions,
  - put the multiple lines of constants writes in an array and iterate over
  it to write the values,
  - add reviewed-bys,
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: remove unneeded temporary variable

Here, the rc variable is either used only for the condition right after
the assignment or right before being used as the return value of the
function it's being used in.

So let's remove this unneeded temporary variable whenever possible.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: shorten `x != 0` condition to `x`

`if (x != 0)` is basically a more verbose version of `if (x)` so let's
use the latter so it's consistent throughout the whole driver.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: remove unneeded parenthesis

The == operator precedes the || operator, so we can remove the
parenthesis around (a == b) || (c == d).

The condition is rather explicit and short so removing the parenthesis
definitely does not make it harder to read.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: Add EEE init sequence

Microsemi PHYs (VSC 8530/31/40/41) need to update the Energy Efficient
Ethernet initialization sequence.
In order to avoid certain link state errors that could result in link
drops and packet loss, the physical coding sublayer (PCS) must be
updated with settings related to EEE in order to improve performance.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: add ethtool statistics counters

There are a few counters available in the PHY: receive errors, false
carriers, link disconnects, media CRC errors and valids counters.

So let's expose those in the PHY driver.

Use the priv structure as the next PHY to be supported has a few
additional counters.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: migrate to phy_select/restore_page functions

The Microsemi PHYs have multiple banks of registers (called pages).
Registers can only be accessed from one page, if we need a register from
another page, we need to switch the page and the registers of all other
pages are not accessible anymore.

Basically, to read register 5 from page 0, 1, 2, etc., you do the same
phy_read(phydev, 5); but you need to set the desired page beforehand.

In order to guarantee that two concurrent functions do not change the
page, we need to do some locking per page. This can be achieved with the
use of phy_select_page and phy_restore_page functions but phy_write/read
calls in-between those two functions shall be replaced by their
lock-free alternative __phy_write/read.

Let's migrate this driver to those functions.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa2: fix and improve dpaa2-ptp driver

This patch is to fix and improve dpaa2-ptp driver
in some places.

- Fixed the return for some functions.
- Replaced kzalloc with devm_kzalloc.
- Removed dev_set_drvdata(dev, NULL).
- Made ptp_dpaa2_caps const.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa2: remove unused code for dprtc

This patch is to removed unused code for dprtc.
This code will be re-added along with more features
of dpaa2-ptp added.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa2: rename rtc as ptp in dpaa2-ptp driver

In dpaa2-ptp driver, it's odd to use rtc in names of
some functions and structures except these dprtc APIs.
This patch is to use ptp instead of rtc in names.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa2: fix dependency of config FSL_DPAA2_ETH

The NETDEVICES dependency and ETHERNET dependency hadn't
been required since dpaa2-eth was moved out of staging.
Also allowed COMPILE_TEST for dpaa2-eth.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: update files maintained under DPAA2 PTP/ETHERNET

The files maintained under DPAA2 PTP/ETHERNET needs to
be updated since dpaa2 ptp driver had been moved into
drivers/net/ethernet/freescale/dpaa2/.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa2: move DPAA2 PTP driver out of staging/

This patch is to move DPAA2 PTP driver out of staging/
since the dpaa2-eth had been moved out.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vhost: remove bad code line

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: pie: fix coding style issues

Fix 5 warnings and 14 checks issued by checkpatch.pl:

CHECK: Logical continuations should be on the previous line
+ if ((q->vars.qdelay < q->params.target / 2)
+     && (q->vars.prob < MAX_PROB / 5))

WARNING: line over 80 characters
+ q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE]));

CHECK: Blank lines aren't necessary after an open brace '{'
+{
+

CHECK: braces {} should be used on all arms of this statement
+ if (qlen < QUEUE_THRESHOLD)
[...]
+ else {
[...]

CHECK: Unbalanced braces around else statement
+ else {

CHECK: No space is necessary after a cast
+ if (delta > (s32) (MAX_PROB / (100 / 2)) &&

CHECK: Unnecessary parentheses around 'qdelay == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

CHECK: Unnecessary parentheses around 'qdelay_old == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

CHECK: Unnecessary parentheses around 'q->vars.prob == 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+     (q->vars.qdelay_old < q->params.target / 2) &&
+     (q->vars.prob == 0) &&
+     (q->vars.avg_dq_rate > 0))

CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+     (q->vars.qdelay_old < q->params.target / 2) &&
+     (q->vars.prob == 0) &&
+     (q->vars.avg_dq_rate > 0))

CHECK: Blank lines aren't necessary before a close brace '}'
+
+}

CHECK: Comparison to NULL could be written "!opts"
+ if (opts == NULL)

CHECK: No space is necessary after a cast
+ ((u32) PSCHED_TICKS2NS(q->params.target)) /

WARNING: line over 80 characters
+     nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) ||

CHECK: Blank lines aren't necessary before a close brace '}'
+
+}

CHECK: No space is necessary after a cast
+ .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) /

WARNING: Missing a blank line after declarations
+ struct sk_buff *skb;
+ skb = qdisc_dequeue_head(sch);

WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ qdisc_reset_queue(sch);

WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ q->params.tupdate = 0;

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove unnecessary unsigned integer comparison and initialize variable

There is no need to compare *val.vu32* with < 0 because
such variable is of type u32 (32 bits, unsigned), making it
impossible to hold a negative value. Fix this by removing
such comparison.

Also, initialize variable *max_val* to -1, just in case
it is not initialized to either BNXT_MSIX_VEC_MAX or
BNXT_MSIX_VEC_MIN_MAX before using it in a comparison
with val.vu32 at line 159:

if (val.vu32 > max_val)

Addresses-Coverity-ID: 1473915 ("Unsigned compared against 0")
Addresses-Coverity-ID: 1473920 ("Uninitialized scalar variable")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2018-10-07' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.20

Second set of patches for 4.20. Heavy refactoring on mt76 continues
and the usual drivers in active development (iwlwifi, qtnfmac, ath10k)
getting new features. And as always, fixes and cleanup all over.

Major changes:

mt76

* more major refactoring to make it easier add new hardware support

* more work on mt76x0e support

* support for getting firmware version via ethtool

* add mt7650 PCI ID

iwlwifi

* HE radiotap cleanup and improvements

* reorder channel optimization for scans

* bump the FW API version

qtnfmac

* fixes for 'iw' output: rates for enabled SGI, 'dump station'

* expose more scan features to host: scan flush and dwell time

* inform cfg80211 when OBSS is not supported by firmware

wlcore

* add support for optional wakeirq

ath10k

* retrieve MAC address from system firmware if provided

* support extended board data download for dual-band QCA9984

* extended per sta tx statistics support via debugfs

* average ack rssi support for data frames

* speed up QCA6174 and QCA9377 firmware download using diag Copy
Engine

* HTT High Latency mode support needed by SDIO and USB support

* get STA power save state via debugfs

ath9k

* add reset functionality for airtime station debugfs file
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Merge tag 'mt76-for-kvalo-2018-10-05' of https://github.com/nbd168/wireless

mt76 patches for 4.20

* unify code between mt76x0, mt76x2
* mt76x0 fixes
* another fix for rx buffer allocation regression on usb
* move mt76x2 source files to mt76x2 folder
* more work on mt76x0e support

Merge tag 'iwlwifi-next-for-kalle-2018-10-06' of git://git./linux/kernel/git/iwlwifi/iwlwifi-next

Third set of iwlwifi patches for 4.20

* Fix for a race condition that caused the FW to crash;
* HE radiotap cleanup and improvements;
* Reorder channel optimization for scans;
* Bumped the FW API version supported after the last API change for
this release;
* Debugging improvements;
* A few bug fixes;
* Some cleanups in preparation for a new implementation;
* Other small improvements, cleanups and fixes.

Merge git://git./linux/kernel/git/davem/net

Dave writes:
  "Networking fixes:

  1) Fix truncation of 32-bit right shift in bpf, from Jann Horn.

  2) Fix memory leak in wireless wext compat, from Stefan Seyfried.

  3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao.

  4) Need to cancel pending work when unbinding in smsc75xx otherwise
     we oops, also from Yu Zhao.

  5) Don't allow enslaving a team device to itself, from Ido Schimmel.

  6) Fix backwards compat with older userspace for rtnetlink FDB dumps.
     From Mauricio Faria.

  7) Add validation of tc policy netlink attributes, from David Ahern.

  8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
  ipv6: take rcu lock in rawv6_send_hdrinc()
  net: sched: Add policy validation for tc attributes
  rtnetlink: fix rtnl_fdb_dump() for ndmsg header
  yam: fix a missing-check bug
  net: bpfilter: Fix type cast and pointer warnings
  net: cxgb3_main: fix a missing-check bug
  bpf: 32-bit RSH verification must truncate input before the ALU op
  net: phy: phylink: fix SFP interface autodetection
  be2net: don't flip hw_features when VXLANs are added/deleted
  net/packet: fix packet drop as of virtio gso
  net: dsa: b53: Keep CPU port as tagged in all VLANs
  openvswitch: load NAT helper
  bnxt_en: get the reduced max_irqs by the ones used by RDMA
  bnxt_en: free hwrm resources, if driver probe fails.
  bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
  bnxt_en: Fix VNIC reservations on the PF.
  team: Forbid enslaving team device to itself
  net/usb: cancel pending work when unbinding smsc75xx
  mlxsw: spectrum: Delete RIF when VLAN device is removed
  ...

iwlwifi: dbg: make trigger functions type agnostic

As preparation for new trigger type, make iwl_fw_dbg_collect_desc
agnostic to the trigger structure.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: decrement occurrences for all triggers

iwl_fw_dbg_collect can be called by any function that already
has the error string ready. iwl_fw_dbg_collect_trig, on the
other hand, does string formatting. The occurrences decrement
is at iwl_fw_dbg_collect_trig, instead of iwl_fw_dbg_collect,
which causes it to sometimes be skipped. Move it to the right
location.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: use match_string() helper

match_string() returns the index of an array for a matching string,
which can be used intead of open coded variant.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: make iwl_fw_dbg_no_trig_window trigger agnostic

As preparation for new trigger format, make the function
agnostic to the trigger fomat. Instead it gets the relevant
parameters - id and delay.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: show more HE radiotap data for TB PPDUs

For trigger-based PPDUs, most values aren't part of the HE-SIG-A
because they're preconfigured by the trigger frame. However, we
still have this information since we used the trigger frame to
configure the hardware, so we can (and do) read it back out and
can thus show it in radiotap.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: decode HE information for MU (without ext info)

When the info type is MU, we still have the data from the TSF
overload words, so should decode that. When it's MU_EXT_INFO
we additionally have the SIG-B common 0/1/2 fields.

Also document the validity depending on the info type and fix
the name of the regular TB PPDU info type accordingly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: add debugfs to send host command

Add debugfs to send host command in mvm and fmac op modes.
Allows to send host command at runtime via send_hcmd debugfs file.
The command is received as a string that represents hex values.

The struct of the command is as follows:
[cmd_id][flags][length][data]
cmd_id and flags are 8 chars long each.
length is 4 chars long.
data is length * 2 chars long.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: runtime: add send host command op to firmware runtime op struct

Add send host command op to firmware runtime op struct to allow sending
host commands to the op mode from the fw runtime context.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: bump firmware API version for 9000 and 22000 series devices

Bump the firmware API version to 41.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm Support new MCC update response

Change MCC update response API to be compatible with new FW API.
While at it change v2 which is not in use anymore to v3 and cleanup
mcc_update v1 command and response which is obsolete.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: pcie: check iwl_pcie_txq_build_tfd() return value

If we use the iwl_pcie_txq_build_tfd() return value for BIT(),
we should validate that it's not going to be negative, so do
the check and bail out if we hit an error. We shouldn't, as
we check if it'll fit beforehand, but better be safe.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: add fall through comment

The fall-through to the MVM case is intended as we have to do
*something* to continue, and can't easily clean up. So we'll
just fail in mvm later, if this does happen.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: pcie gen2: check iwl_pcie_gen2_set_tb() return value

If we use the iwl_pcie_gen2_set_tb() return value for BIT(),
we should validate that it's not going to be negative, so do
the check and bail out if we hit an error. We shouldn't, as
we check if it'll fit beforehand, but better be safe.

Fixes: ab6c644539e9 ("iwlwifi: pcie: copy TX functions to new transport")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: nvm: get num of hw addresses from firmware

With NICs that don't read the NVM directly and instead rely on getting
the relevant data from the firmware, the number of reserved MAC
addresses was not added to the API. This caused the driver to assume
there is only one address which results in all interfaces getting the
same address. Update the API to fix this.

While at it, fix-up the comments with firmware api names to actually
match what we have in the firmware.

Fixes: e9e1ba3dbf00 ("iwlwifi: mvm: support getting nvm data from firmware")
Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: add dump collection in case alive flow fails

Trigger dump collection if the alive flow fails, regardless of the
reason.

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: pcie: avoid empty free RB queue

If all free RB queues are empty, the driver will never restock the
free RB queue. That's because the restocking happens in the Rx flow,
and if the free queue is empty there will be no Rx.

Although there's a background worker (a.k.a. allocator) allocating
memory for RBs so that the Rx handler can restock them, the worker may
run only after the free queue has become empty (and then it is too
late for restocking as explained above).

There is a solution for that called 'emergency': If the number of used
RB's reaches half the amount of all RB's, the Rx handler will not wait
for the allocator but immediately allocate memory for the used RB's
and restock the free queue.

But, since the used RB's is per queue, it may happen that the used
RB's are spread between the queues such that the emergency check will
fail for each of the queues
(and still run out of RBs, causing the above symptom).

To fix it, move to emergency mode if the sum of *all* used RBs (for
all Rx queues) reaches half the amount of all RB's

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: set max TX/RX A-MPDU subframes to HE limit

In mac80211, the default remains for HT, so set the limit to
HE for our driver.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: add more information to HE radiotap

For SU/SU-ER/MU PPDUs we have spatial reuse.

For those where it's relevant we also know the pre-FEC
padding factor, PE disambiguity bit, beam change bit
and doppler bit.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: add LDPC-XSYM to HE radiotap data

Add information about the LDCP extra symbol segment to the HE
data when applicable (not for trigger-based PPDUs).

While at it, clean up the code for UL/DL a bit.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: add TXOP to HE radiotap data

We have this data available, so add it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: move HE-MU LTF_NUM parsing to he_phy_data parsing

This code gets shorter if it doesn't have to check all the
conditions, so move it to an appropriate place that has all
of them validated already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: clean up HE radiotap RU allocation parsing

Split the code out into a separate routine, and move that to be
called inside the previously introduced iwl_mvm_decode_he_phy_data()
function.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: pull some he_phy_data decoding into a separate function

Pull some of the decoding of he_phy_data into a separate function so
we don't need to check over and over again if it's valid.

While at it, fix the UL/DL bit reporting to be for all but trigger-
based frames.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: put HE SIG-B symbols/users data correctly

As detected by Luca during code review when I move this in the
next patch, the code here is putting the data into the wrong
field (flags1 instead of flags2). Fix that.

Fixes: e5721e3f770f ("iwlwifi: mvm: add radiotap data for HE")
Reported-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: minor cleanups to HE radiotap code

Remove a stray empty line, unbreak some lines that aren't
really that long, and move on variable setting into the
initializer to avoid initializing it twice.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: remove unnecessary overload variable

This is equivalent to checking he_phy_data != HE_PHY_DATA_INVAL,
which is already done in a number of places, so remove the extra
'overload' variable entirely.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: clear HW_RESTART_REQUESTED when stopping the interface

Fix a bug that happens in the following scenario:
1) suspend without WoWLAN
2) mac80211 calls drv_stop because of the suspend
3) __iwl_mvm_mac_stop deallocates the aux station
4) during drv_stop the firmware crashes
5) iwlmvm:
* sets IWL_MVM_STATUS_HW_RESTART_REQUESTED
* asks mac80211 to kick the restart flow
6) mac80211 puts the restart worker into a freezable
   queue which means that the worker will not run for now
   since the workqueue is already frozen
7) ...
8) resume
9) mac80211 runs ieee80211_reconfig as part of the resume
10) mac80211 detects that a restart flow has been requested
    and that we are now resuming from suspend and cancels
    the restart worker
11) mac80211 calls drv_start()
12) __iwl_mvm_mac_start checks that IWL_MVM_STATUS_HW_RESTART_REQUESTED
    clears it, sets IWL_MVM_STATUS_IN_HW_RESTART and calls
    iwl_mvm_restart_cleanup()
13) iwl_fw_error_dump gets called and accesses the device
    to get debug data
14) iwl_mvm_up adds the aux station
15) iwl_mvm_add_aux_sta() allocates an internal station for
    the aux station
16) iwl_mvm_allocate_int_sta() tests IWL_MVM_STATUS_IN_HW_RESTART
    and doesn't really allocate a station ID for the aux
    station
17) a new queue is added for the aux station

Note that steps from 5 to 9 aren't really part of the
problem but were described for the sake of completeness.

Once the iwl_mvm_mac_stop() is called, the device is not
accessible, meaning that step 12) can't succeed and we'll
see the following:

drivers/net/wireless/intel/iwlwifi/pcie/trans.c:2122 iwl_trans_pcie_grab_nic_access+0xc0/0x1d6 [iwlwifi]()
Timeout waiting for hardware access (CSR_GP_CNTRL 0x080403d8)
Call Trace:
[<ffffffffc03e6ad3>] iwl_trans_pcie_grab_nic_access+0xc0/0x1d6 [iwlwifi]
[<ffffffffc03e6a13>] iwl_trans_pcie_dump_regs+0x3fd/0x3fd [iwlwifi]
[<ffffffffc03dad42>] iwl_fw_error_dump+0x4f5/0xe8b [iwlwifi]
[<ffffffffc04bd43e>] __iwl_mvm_mac_start+0x5a/0x21a [iwlmvm]
[<ffffffffc04bd6d2>] iwl_mvm_mac_start+0xd4/0x103 [iwlmvm]
[<ffffffffc042d378>] drv_start+0xa1/0xc5 [iwl7000_mac80211]
[<ffffffffc045a339>] ieee80211_reconfig+0x145/0xf50 [mac80211]
[<ffffffffc044788b>] ieee80211_resume+0x62/0x66 [mac80211]
[<ffffffffc0366c5b>] wiphy_resume+0xa9/0xc6 [cfg80211]

The station id of the aux station is set to 0xff in step 3
and because we don't really allocate a new station id for
the auxliary station (as explained in 16), we end up sending
a command to the firmware asking to connect the queue
to station id 0xff. This makes the firmware crash with the
following information:

0x00002093 | ADVANCED_SYSASSERT
0x000002F0 | trm_hw_status0
0x00000000 | trm_hw_status1
0x00000B38 | branchlink2
0x0001978C | interruptlink1
0x00000000 | interruptlink2
0xFF080501 | data1
0xDEADBEEF | data2
0xDEADBEEF | data3
Firmware error during reconfiguration - reprobe!
FW error in SYNC CMD SCD_QUEUE_CFG

Fix this by clearing IWL_MVM_STATUS_HW_RESTART_REQUESTED
in iwl_mvm_mac_stop(). We won't be able to collect debug
data anyway and when we will brought up again, we will
have a clean state from the firmware perspective.
Since we won't have IWL_MVM_STATUS_IN_HW_RESTART set in
step 12) we won't get to the 2093 ASSERT either.

Fixes: bf8b286f86fc ("iwlwifi: mvm: defer setting IWL_MVM_STATUS_IN_HW_RESTART")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: group trigger condition to helper function

The triplet of get trigger, is trigger enabled and is trigger stopped
repeats itself. Group them in a function to avoid code duplication.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: dump memory in a helper function

The code that dumps various memory types repeats itself. Move it to a
function to avoid duplication.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: allow channel reorder optimization during scan

Allow the FW to reorder HB channels and first scan HB channels with
assumed APs, in order to reduce the scan duration.

Currently enable it for all scan requests types.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: split iwl_fw_error_dump to two functions

Split iwl_fw_error_dump to two parts. The first part will dump the
actual data, and second will do the file allocations, trans calls and
actual file operations. This is done in order to enable reuse of the
code for the new debug ini infrastructure.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: refactor dump code to improve readability

Add a macro to replace all the conditions checking for valid dump
length. In addition, move the fifo len calculation to a helper
function.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: dbg: move debug data to a struct

The debug variables are bloating the iwl_fw struct. And the fields
are out of order, missing docs and some are redundant.

Clean this up. This serves as preparation for unionizing it for the
new ini infra.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: check for n_profiles validity in EWRD ACPI

When reading the profiles from the EWRD table in ACPI, we loop over
the data and set it into our internal table. We use the number of
profiles specified in ACPI without checking its validity, so if the
ACPI table is corrupted and the number is larger than our array size,
we will try to make an out-of-bounds access.

Fix this by making sure the value specified in the ACPI table is
valid.

Fixes: 6996490501ed ("iwlwifi: mvm: add support for EWRD (Dynamic SAR) ACPI table")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

Merge branch 'akpm'

* akpm:
  mm: madvise(MADV_DODUMP): allow hugetlbfs pages
  ocfs2: fix locking for res->tracking and dlm->tracking_list
  mm/vmscan.c: fix int overflow in callers of do_shrink_slab()
  mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
  mm/vmstat.c: fix outdated vmstat_text
  proc: restrict kernel stack dumps to root
  mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes
  mm/migrate.c: split only transparent huge pages when allocation fails
  ipc/shm.c: use ERR_CAST() for shm_lock() error return
  mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
  mm, thp: fix mlocking THP page with migration enabled
  ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
  hugetlb: take PMD sharing into account when flushing tlb/caches
  mm: migration: fix migration of huge PMD shared pages

mm: madvise(MADV_DODUMP): allow hugetlbfs pages

Reproducer, assuming 2M of hugetlbfs available:

Hugetlbfs mounted, size=2M and option user=testuser

  # mount | grep ^hugetlbfs
  hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan)
  # sysctl vm.nr_hugepages=1
  vm.nr_hugepages = 1
  # grep Huge /proc/meminfo
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:       1
  HugePages_Free:        1
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  Hugetlb:            2048 kB

Code:

  #include <sys/mman.h>
  #include <stddef.h>
  #define SIZE 2*1024*1024
  int main()
  {
    void *ptr;
    ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0);
    madvise(ptr, SIZE, MADV_DONTDUMP);
    madvise(ptr, SIZE, MADV_DODUMP);
  }

Compile and strace:

  mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000
  madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0
  madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)

hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].

The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a
consequence of the large useage of VM_DONTEXPAND in device drivers.

A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.

A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the same
memory.  We correct this omission by allowing madvice(MADV_DODUMP) on
hugetlbfs pages.

[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")

Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com
Link: https://lists.launchpad.net/maria-discuss/msg05245.html
Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers")
Reported-by: Kenneth Penza <kpenza@gmail.com>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: fix locking for res->tracking and dlm->tracking_list

In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.

Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.

Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>