git.openwrt.org Git - openwrt/staging/blogic.git/log

livepatch: Module coming and going callbacks can proceed with all listed patches

Livepatches can no longer get enabled and disabled repeatedly.
The list klp_patches contains only enabled patches and eventually
the patch in transition.

The module coming and going callbacks do no longer need to check
for these state. They have to proceed with all listed patches.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>

livepatch: Proper error handling in the shadow variables selftest

Add proper error handling when allocating or getting shadow variables
in the selftest. It prevents an invalid pointer access in some situations.
It shows the good programming practice in the others.

The error codes are just the best guess and specific for this particular
test. In general, klp_shadow_alloc() returns NULL also when the given
shadow variable has already been allocated. In addition, both
klp_shadow_alloc() and klp_shadow_get_or_alloc() might fail from
other reasons when the constructor fails.

Note, that the error code is not really important even in the real life.
The use of shadow variables should be transparent for the original
livepatched code.

Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>

livepatch: return -ENOMEM on ptr_id() allocation failure

Fixes the following smatch warning:

lib/livepatch/test_klp_shadow_vars.c:47 ptr_id() warn: returning -1 instead of -ENOMEM is sloppy

Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>

livepatch: Introduce klp_for_each_patch macro

There are already macros to iterate over struct klp_func and klp_object.

Add also klp_for_each_patch(). But make it internal because also
klp_patches list is internal.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>

livepatch: core: Return EOPNOTSUPP instead of ENOSYS

As a result of an unsupported operation is better to use EOPNOTSUPP
as error code.
ENOSYS is only used for 'invalid syscall nr' and nothing else.

Signed-off-by: Alice Ferrazzi <alice.ferrazzi@miraclelinux.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>

selftests/livepatch: add DYNAMIC_DEBUG config dependency

The livepatch selftest scripts turn on dynamic_debug of livepatch
kernel source to determine expected behavior. TEST_LIVEPATCH should
therefore include DYNAMIC_DEBUG in its list of dependencies.

Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>

selftests/livepatch: introduce tests

Add a few livepatch modules and simple target modules that the included
regression suite can run tests against:

  - basic livepatching (multiple patches, atomic replace)
  - pre/post (un)patch callbacks
  - shadow variable API

Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Alice Ferrazzi <alice.ferrazzi@gmail.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Remove ordering (stacking) of the livepatches

The atomic replace and cumulative patches were introduced as a more secure
way to handle dependent patches. They simplify the logic:

  + Any new cumulative patch is supposed to take over shadow variables
    and changes made by callbacks from previous livepatches.

  + All replaced patches are discarded and the modules can be unloaded.
    As a result, there is only one scenario when a cumulative livepatch
    gets disabled.

The different handling of "normal" and cumulative patches might cause
confusion. It would make sense to keep only one mode. On the other hand,
it would be rude to enforce using the cumulative livepatches even for
trivial and independent (hot) fixes.

However, the stack of patches is not really necessary any longer.
The patch ordering was never clearly visible via the sysfs interface.
Also the "normal" patches need a lot of caution anyway.

Note that the list of enabled patches is still necessary but the ordering
is not longer enforced.

Otherwise, the code is ready to disable livepatches in an random order.
Namely, klp_check_stack_func() always looks for the function from
the livepatch that is being disabled. klp_func structures are just
removed from the related func_stack. Finally, the ftrace handlers
is removed only when the func_stack becomes empty.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Atomic replace and cumulative patches documentation

User documentation for the atomic replace feature. It makes it easier
to maintain livepatches using so-called cumulative patches.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Remove Nop structures when unused

Replaced patches are removed from the stack when the transition is
finished. It means that Nop structures will never be needed again
and can be removed. Why should we care?

  + Nop structures give the impression that the function is patched
    even though the ftrace handler has no effect.

  + Ftrace handlers do not come for free. They cause slowdown that might
    be visible in some workloads. The ftrace-related slowdown might
    actually be the reason why the function is no longer patched in
    the new cumulative patch. One would expect that cumulative patch
    would help solve these problems as well.

  + Cumulative patches are supposed to replace any earlier version of
    the patch. The amount of NOPs depends on which version was replaced.
    This multiplies the amount of scenarios that might happen.

    One might say that NOPs are innocent. But there are even optimized
    NOP instructions for different processors, for example, see
    arch/x86/kernel/alternative.c. And klp_ftrace_handler() is much
    more complicated.

  + It sounds natural to clean up a mess that is no longer needed.
    It could only be worse if we do not do it.

This patch allows to unpatch and free the dynamic structures independently
when the transition finishes.

The free part is a bit tricky because kobject free callbacks are called
asynchronously. We could not wait for them easily. Fortunately, we do
not have to. Any further access can be avoided by removing them from
the dynamic lists.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Add atomic replace

Sometimes we would like to revert a particular fix. Currently, this
is not easy because we want to keep all other fixes active and we
could revert only the last applied patch.

One solution would be to apply new patch that implemented all
the reverted functions like in the original code. It would work
as expected but there will be unnecessary redirections. In addition,
it would also require knowing which functions need to be reverted at
build time.

Another problem is when there are many patches that touch the same
functions. There might be dependencies between patches that are
not enforced on the kernel side. Also it might be pretty hard to
actually prepare the patch and ensure compatibility with the other
patches.

Atomic replace && cumulative patches:

A better solution would be to create cumulative patch and say that
it replaces all older ones.

This patch adds a new "replace" flag to struct klp_patch. When it is
enabled, a set of 'nop' klp_func will be dynamically created for all
functions that are already being patched but that will no longer be
modified by the new patch. They are used as a new target during
the patch transition.

The idea is to handle Nops' structures like the static ones. When
the dynamic structures are allocated, we initialize all values that
are normally statically defined.

The only exception is "new_func" in struct klp_func. It has to point
to the original function and the address is known only when the object
(module) is loaded. Note that we really need to set it. The address is
used, for example, in klp_check_stack_func().

Nevertheless we still need to distinguish the dynamically allocated
structures in some operations. For this, we add "nop" flag into
struct klp_func and "dynamic" flag into struct klp_object. They
need special handling in the following situations:

  + The structures are added into the lists of objects and functions
    immediately. In fact, the lists were created for this purpose.

  + The address of the original function is known only when the patched
    object (module) is loaded. Therefore it is copied later in
    klp_init_object_loaded().

  + The ftrace handler must not set PC to func->new_func. It would cause
    infinite loop because the address points back to the beginning of
    the original function.

  + The various free() functions must free the structure itself.

Note that other ways to detect the dynamic structures are not considered
safe. For example, even the statically defined struct klp_object might
include empty funcs array. It might be there just to run some callbacks.

Also note that the safe iterator must be used in the free() functions.
Otherwise already freed structures might get accessed.

Special callbacks handling:

The callbacks from the replaced patches are _not_ called by intention.
It would be pretty hard to define a reasonable semantic and implement it.

It might even be counter-productive. The new patch is cumulative. It is
supposed to include most of the changes from older patches. In most cases,
it will not want to call pre_unpatch() post_unpatch() callbacks from
the replaced patches. It would disable/break things for no good reasons.
Also it should be easier to handle various scenarios in a single script
in the new patch than think about interactions caused by running many
scripts from older patches. Not to say that the old scripts even would
not expect to be called in this situation.

Removing replaced patches:

One nice effect of the cumulative patches is that the code from the
older patches is no longer used. Therefore the replaced patches can
be removed. It has several advantages:

  + Nops' structs will no longer be necessary and might be removed.
    This would save memory, restore performance (no ftrace handler),
    allow clear view on what is really patched.

  + Disabling the patch will cause using the original code everywhere.
    Therefore the livepatch callbacks could handle only one scenario.
    Note that the complication is already complex enough when the patch
    gets enabled. It is currently solved by calling callbacks only from
    the new cumulative patch.

  + The state is clean in both the sysfs interface and lsmod. The modules
    with the replaced livepatches might even get removed from the system.

Some people actually expected this behavior from the beginning. After all
a cumulative patch is supposed to "completely" replace an existing one.
It is like when a new version of an application replaces an older one.

This patch does the first step. It removes the replaced patches from
the list of patches. It is safe. The consistency model ensures that
they are no longer used. By other words, each process works only with
the structures from klp_transition_patch.

The removal is done by a special function. It combines actions done by
__disable_patch() and klp_complete_transition(). But it is a fast
track without all the transaction-related stuff.

Signed-off-by: Jason Baron <jbaron@akamai.com>
[pmladek@suse.com: Split, reuse existing code, simplified]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Use lists to manage patches, objects and functions

Currently klp_patch contains a pointer to a statically allocated array of
struct klp_object and struct klp_objects contains a pointer to a statically
allocated array of klp_func. In order to allow for the dynamic allocation
of objects and functions, link klp_patch, klp_object, and klp_func together
via linked lists. This allows us to more easily allocate new objects and
functions, while having the iterator be a simple linked list walk.

The static structures are added to the lists early. It allows to add
the dynamically allocated objects before klp_init_object() and
klp_init_func() calls. Therefore it reduces the further changes
to the code.

This patch does not change the existing behavior.

Signed-off-by: Jason Baron <jbaron@akamai.com>
[pmladek@suse.com: Initialize lists before init calls]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Simplify API by removing registration step

The possibility to re-enable a registered patch was useful for immediate
patches where the livepatch module had to stay until the system reboot.
The improved consistency model allows to achieve the same result by
unloading and loading the livepatch module again.

Also we are going to add a feature called atomic replace. It will allow
to create a patch that would replace all already registered patches.
The aim is to handle dependent patches more securely. It will obsolete
the stack of patches that helped to handle the dependencies so far.
Then it might be unclear when a cumulative patch re-enabling is safe.

It would be complicated to support the many modes. Instead we could
actually make the API and code easier to understand.

Therefore, remove the two step public API. All the checks and init calls
are moved from klp_register_patch() to klp_enabled_patch(). Also the patch
is automatically freed, including the sysfs interface when the transition
to the disabled state is completed.

As a result, there is never a disabled patch on the top of the stack.
Therefore we do not need to check the stack in __klp_enable_patch().
And we could simplify the check in __klp_disable_patch().

Also the API and logic is much easier. It is enough to call
klp_enable_patch() in module_init() call. The patch can be disabled
by writing '0' into /sys/kernel/livepatch/<patch>/enabled. Then the module
can be removed once the transition finishes and sysfs interface is freed.

The only problem is how to free the structures and kobjects safely.
The operation is triggered from the sysfs interface. We could not put
the related kobject from there because it would cause lock inversion
between klp_mutex and kernfs locks, see kn->count lockdep map.

Therefore, offload the free task to a workqueue. It is perfectly fine:

  + The patch can no longer be used in the livepatch operations.

  + The module could not be removed until the free operation finishes
    and module_put() is called.

  + The operation is asynchronous already when the first
    klp_try_complete_transition() fails and another call
    is queued with a delay.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Don't block the removal of patches loaded after a forced transition

module_put() is currently never called in klp_complete_transition() when
klp_force is set. As a result, we might keep the reference count even when
klp_enable_patch() fails and klp_cancel_transition() is called.

This might give the impression that a module might get blocked in some
strange init state. Fortunately, it is not the case. The reference count
is ignored when mod->init fails and erroneous modules are always removed.

Anyway, this might be confusing. Instead, this patch moves
the global klp_forced flag into struct klp_patch. As a result,
we block only modules that might still be in use after a forced
transition. Newly loaded livepatches might be eventually completely
removed later.

It is not a big deal. But the code is at least consistent with
the reality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Consolidate klp_free functions

The code for freeing livepatch structures is a bit scattered and tricky:

  + direct calls to klp_free_*_limited() and kobject_put() are
    used to release partially initialized objects

  + klp_free_patch() removes the patch from the public list
    and releases all objects except for patch->kobj

  + object_put(&patch->kobj) and the related wait_for_completion()
    are called directly outside klp_mutex; this code is duplicated;

Now, we are going to remove the registration stage to simplify the API
and the code. This would require handling more situations in
klp_enable_patch() error paths.

More importantly, we are going to add a feature called atomic replace.
It will need to dynamically create func and object structures. We will
want to reuse the existing init() and free() functions. This would
create even more error path scenarios.

This patch implements more straightforward free functions:

  + checks kobj_added flag instead of @limit[*]

  + initializes patch->list early so that the check for empty list
    always works

  + The action(s) that has to be done outside klp_mutex are done
    in separate klp_free_patch_finish() function. It waits only
    when patch->kobj was really released via the _start() part.

The patch does not change the existing behavior.

[*] We need our own flag to track that the kobject was successfully
    added to the hierarchy.  Note that kobj.state_initialized only
    indicates that kobject has been initialized, not whether is has
    been added (and needs to be removed on cleanup).

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Shuffle klp_enable_patch()/klp_disable_patch() code

We are going to simplify the API and code by removing the registration
step. This would require calling init/free functions from enable/disable
ones.

This patch just moves the code to prevent more forward declarations.

This patch does not change the code except for two forward declarations.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: Change unsigned long old_addr -> void *old_func in struct klp_func

The address of the to be patched function and new function is stored
in struct klp_func as:

void *new_func;
unsigned long old_addr;

The different naming scheme and type are derived from the way
the addresses are set. @old_addr is assigned at runtime using
kallsyms-based search. @new_func is statically initialized,
for example:

static struct klp_func funcs[] = {
{
.old_name = "cmdline_proc_show",
.new_func = livepatch_cmdline_proc_show,
}, { }
};

This patch changes unsigned long old_addr -> void *old_func. It removes
some confusion when these address are later used in the code. It is
motivated by a followup patch that adds special NOP struct klp_func
where we want to assign func->new_func = func->old_addr respectively
func->new_func = func->old_func.

This patch does not modify the existing behavior.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Alice Ferrazzi <alice.ferrazzi@gmail.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching

Pull livepatch update from Jiri Kosina:
"Return value checking fixup in livepatching samples, from Nicholas Mc
Guire"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: check kzalloc return values

Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux

Pull thermal management updates from Zhang Rui:

- Add locking for cooling device sysfs attribute in case the cooling
   device state is changed by userspace and thermal framework
   simultaneously. (Thara Gopinath)

- Fix a problem that passive cooling is reset improperly after system
   suspend/resume. (Wei Wang)

- Cleanup the driver/thermal/ directory by moving intel and qcom
   platform specific drivers to platform specific sub-directories. (Amit
   Kucheria)

- Some trivial cleanups. (Lukasz Luba, Wolfram Sang)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal/intel: fixup for Kconfig string parsing tightening up
  drivers: thermal: Move QCOM_SPMI_TEMP_ALARM into the qcom subdir
  drivers: thermal: Move various drivers for intel platforms into a subdir
  thermal: Fix locking in cooling device sysfs update cur_state
  Thermal: do not clear passive state during system sleep
  thermal: zx2967_thermal: simplify getting .driver_data
  thermal: st: st_thermal: simplify getting .driver_data
  thermal: spear_thermal: simplify getting .driver_data
  thermal: rockchip_thermal: simplify getting .driver_data
  thermal: int340x_thermal: int3400_thermal: simplify getting .driver_data
  thermal: remove unused function parameter

Merge branch 'linus' of git://git./linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal SoC updates from Eduardo Valentin:

- Tegra DT binding documentation for Tegra194

- Armada now supports ap806 and cp110

- RCAR thermal now supports R8A774C0 and R8A77990

- Fixes on thermal_hwmon, IMX, generic-ADC, ST, RCAR, Broadcom,
   Uniphier, QCOM, Tegra, PowerClamp, and Armada thermal drivers.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: (22 commits)
  thermal: generic-adc: Fix adc to temp interpolation
  thermal: rcar_thermal: add R8A77990 support
  dt-bindings: thermal: rcar-thermal: add R8A77990 support
  thermal: rcar_thermal: add R8A774C0 support
  dt-bindings: thermal: rcar-thermal: add R8A774C0 support
  dt-bindings: cp110: document the thermal interrupt capabilities
  dt-bindings: ap806: document the thermal interrupt capabilities
  MAINTAINERS: thermal: add entry for Marvell MVEBU thermal driver
  thermal: armada: add overheat interrupt support
  thermal: st: fix Makefile typo
  thermal: uniphier: Convert to SPDX identifier
  thermal/intel_powerclamp: Change to use DEFINE_SHOW_ATTRIBUTE macro
  thermal: tegra: soctherm: Change to use DEFINE_SHOW_ATTRIBUTE macro
  dt-bindings: thermal: tegra-bpmp: Add Tegra194 support
  thermal: imx: save one condition block for normal case of nvmem initialization
  thermal: imx: fix for dependency on cpu-freq
  thermal: tsens: qcom: do not create duplicate regmap debugfs entries
  thermal: armada: Use PTR_ERR_OR_ZERO in armada_thermal_probe_legacy()
  dt-bindings: thermal: rcar-gen3-thermal: All variants use 3 interrupts
  thermal: broadcom: use devm_thermal_zone_of_sensor_register
  ...

Merge tag 'trace-v4.21-1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull ftrace sh build fix from Steven Rostedt:
"It appears that the zero-day bot did find a bug in my sh build.

  And that I didn't have the bad code in my config file when I cross
  compiled it, although there are a few other errors in sh that makes it
  not build for me, I missed that I added one more"

* tag 'trace-v4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  sh: ftrace: Fix missing parenthesis in WARN_ON()

Merge tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
"Three fixes, one for stable, one adds the (most secure) SMB3.1.1
  dialect to default list requested"

* tag '4.21-smb3-small-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: add smb3.1.1 to default dialect list
  cifs: fix confusing warning message on reconnect
  smb3: fix large reads on encrypted connections

Merge tag 'iomap-4.21-merge-3' of git://git./fs/xfs/xfs-linux

Pull iomap maintainer update from Darrick Wong:
"Christoph Hellwig and I have decided to take responsibility for the fs
iomap code rather than let it languish further"

* tag 'iomap-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: take responsibility for the filesystem iomap code

Merge tag 'xfs-4.21-merge-3' of git://git./fs/xfs/xfs-linux

Pull xfs fixlets from Darrick Wong:
"Remove a couple of unnecessary local variables"

* tag 'xfs-4.21-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: xfs_fsops: drop useless LIST_HEAD
xfs: xfs_buf: drop useless LIST_HEAD

Merge tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
"A fairly quiet round: a couple of messenger performance improvements
  from myself and a few cap handling fixes from Zheng"

* tag 'ceph-for-4.21-rc1' of git://github.com/ceph/ceph-client:
  ceph: don't encode inode pathes into reconnect message
  ceph: update wanted caps after resuming stale session
  ceph: skip updating 'wanted' caps if caps are already issued
  ceph: don't request excl caps when mount is readonly
  ceph: don't update importing cap's mseq when handing cap export
  libceph: switch more to bool in ceph_tcp_sendmsg()
  libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage()
  libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage()
  libceph: drop last_piece logic from write_partial_message_data()
  ceph: remove redundant assignment
  ceph: cleanup splice_dentry()

lib/genalloc.c: include vmalloc.h

Fixes build break on most ARM/ARM64 defconfigs:

  lib/genalloc.c: In function 'gen_pool_add_virt':
  lib/genalloc.c:190:10: error: implicit declaration of function 'vzalloc_node'; did you mean 'kzalloc_node'?
  lib/genalloc.c:190:8: warning: assignment to 'struct gen_pool_chunk *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
  lib/genalloc.c: In function 'gen_pool_destroy':
  lib/genalloc.c:254:3: error: implicit declaration of function 'vfree'; did you mean 'kfree'?

Fixes: 6862d2fc8185 ('lib/genalloc.c: use vzalloc_node() to allocate the bitmap')
Cc: Huang Shijie <sjhuang@iluvatar.ai>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Skidanov <alexey.skidanov@intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'mount.part1' of git://git./linux/kernel/git/viro/vfs

Pull vfs mount API prep from Al Viro:
"Mount API prereqs.

  Mostly that's LSM mount options cleanups. There are several minor
  fixes in there, but nothing earth-shattering (leaks on failure exits,
  mostly)"

* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
  mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
  smack: rewrite smack_sb_eat_lsm_opts()
  smack: get rid of match_token()
  smack: take the guts of smack_parse_opts_str() into a new helper
  LSM: new method: ->sb_add_mnt_opt()
  selinux: rewrite selinux_sb_eat_lsm_opts()
  selinux: regularize Opt_... names a bit
  selinux: switch away from match_token()
  selinux: new helper - selinux_add_opt()
  LSM: bury struct security_mnt_opts
  smack: switch to private smack_mnt_opts
  selinux: switch to private struct selinux_mnt_opts
  LSM: hide struct security_mnt_opts from any generic code
  selinux: kill selinux_sb_get_mnt_opts()
  LSM: turn sb_eat_lsm_opts() into a method
  nfs_remount(): don't leak, don't ignore LSM options quietly
  btrfs: sanitize security_mnt_opts use
  selinux; don't open-code a loop in sb_finish_set_opts()
  LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
  new helper: security_sb_eat_lsm_opts()
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull trivial vfs updates from Al Viro:
"A few cleanups + Neil's namespace_unlock() optimization"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  exec: make prepare_bprm_creds static
  genheaders: %-<width>s had been there since v6; %-*s - since v7
  VFS: use synchronize_rcu_expedited() in namespace_unlock()
  iov_iter: reduce code duplication

Merge tag 'mips_fixes_4.21_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
"A few early MIPS fixes for 4.21:

   - The Broadcom BCM63xx platform sees a fix for resetting the BCM6368
     ethernet switch, and the removal of a platform device we've never
     had a driver for.

   - The Alchemy platform sees a few fixes for bitrot that occurred
     within the past few cycles.

   - We now enable vectored interrupt support for the MediaTek MT7620
     SoC, which makes sense since they're supported by the SoC but in
     this case also works around a bug relating to the location of
     exception vectors when using a recent version of U-Boot.

   - The atomic64_fetch_*_relaxed() family of functions see a fix for a
     regression in MIPS64 kernels since v4.19.

   - Cavium Octeon III CN7xxx systems will now disable their RGMII
     interfaces rather than attempt to enable them & warn about the lack
     of support for doing so, as they did since initial CN7xxx ethernet
     support was added in v4.7.

   - The Microsemi/Microchip MSCC SoCs gain a MAINTAINERS entry.

   - .mailmap now provides consistency for Dengcheng Zhu's name &
     current email address"

* tag 'mips_fixes_4.21_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: OCTEON: mark RGMII interface disabled on OCTEON III
  MIPS: Fix a R10000_LLSC_WAR logic in atomic.h
  MIPS: BCM63XX: drop unused and broken DSP platform device
  mailmap: Update name spelling and email for Dengcheng Zhu
  MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8
  MAINTAINERS: Add a maintainer for MSCC MIPS SoCs
  MIPS: Alchemy: update dma masks for devboard devices
  MIPS: Alchemy: update cpu-feature-overrides
  MIPS: Alchemy: drop DB1000 IrDA support bits
  MIPS: alchemy: cpu_all_mask is forbidden for clock event devices
  MIPS: BCM63XX: fix switch core reset on BCM6368

Merge tag 'powerpc-4.21-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"A fix for the recent access_ok() change, which broke the build. We
  recently added a use of type in order to squash a warning elsewhere
  about type being unused.

  A handful of other minor build fixes, and one defconfig update.

  Thanks to: Christian Lamparter, Christophe Leroy, Diana Craciun,
  Mathieu Malaterre"

* tag 'powerpc-4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Drop use of 'type' from access_ok()
  KVM: PPC: Book3S HV: radix: Fix uninitialized var build error
  powerpc/configs: Add PPC4xx_OCM to ppc40x_defconfig
  powerpc/4xx/ocm: Fix phys_addr_t printf warnings
  powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage
  powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'

Merge branch 'parisc-4.21-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fix from Helge Deller:
"Fix boot issues with a series of parisc servers since kernel 4.20.

  Remapping kernel text with set_kernel_text_rw() missed to remap from
  lowest up until the highest huge-page aligned kernel text addresss"

* 'parisc-4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Remap hugepage-aligned pages in set_kernel_text_rw()

Merge tag 'for-4.21' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux

Pull h8300 fix from Yoshinori Sato:
"Build problem fix"

* tag 'for-4.21' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux:
h8300: pci: Remove local declaration of pcibios_penalize_isa_irq

Merge tag 'armsoc-late' of git://git./linux/kernel/git/arm/arm-soc

Pull more ARM SoC updates from Olof Johansson:
"A few updates that we merged late but are low risk for regressions for
  other platforms (and a few other straggling patches):

   - I mis-tagged the 'drivers' branch, and missed 3 patches. Merged in
     here. They're for a driver for the PL353 SRAM controller and a
     build fix for the qualcomm scm driver.

   - A new platform, RDA Micro RDA8810PL (Cortex-A5 w/ integrated
     Vivante GPU, 256MB RAM, Wifi). This includes some acked
     platform-specific drivers (serial, etc). This also include DTs for
     two boards with this SoC, OrangePi 2G and OrangePi i86.

   - i.MX8 is another new platform (NXP, 4x Cortex-A53 + Cortex-M4, 4K
     video playback offload). This is the first i.MX 64-bit SoC.

   - Some minor updates to Samsung boards (adding a few peripherals in
     DTs).

   - Small rework for SMP bootup on STi platforms.

   - A couple of TEE driver fixes.

   - A couple of new config options (bcm2835 thermal, Uniphier MDMAC)
     enabled in defconfigs"

* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (27 commits)
  ARM: multi_v7_defconfig: enable CONFIG_UNIPHIER_MDMAC
  arm64: defconfig: Re-enable bcm2835-thermal driver
  MAINTAINERS: Add entry for RDA Micro SoC architecture
  tty: serial: Add RDA8810PL UART driver
  ARM: dts: rda8810pl: Add interrupt support for UART
  dt-bindings: serial: Document RDA Micro UART
  ARM: dts: rda8810pl: Add timer support
  ARM: dts: Add devicetree for OrangePi i96 board
  ARM: dts: Add devicetree for OrangePi 2G IoT board
  ARM: dts: Add devicetree for RDA8810PL SoC
  ARM: Prepare RDA8810PL SoC
  dt-bindings: arm: Document RDA8810PL and reference boards
  dt-bindings: Add RDA Micro vendor prefix
  ARM: sti: remove pen_release and boot_lock
  arm64: dts: exynos: Add Bluetooth chip to TM2(e) boards
  arm64: dts: imx8mq-evk: enable watchdog
  arm64: dts: imx8mq: add watchdog devices
  MAINTAINERS: add i.MX8 DT path to i.MX architecture
  arm64: add support for i.MX8M EVK board
  arm64: add basic DTS for i.MX8MQ
  ...

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"I'm safely chained back up to my desk, so please pull these arm64
  fixes for -rc1 that address some issues that cropped up during the
  merge window:

   - Prevent KASLR from mapping the top page of the virtual address
     space

   - Fix device-tree probing of SDEI driver

   - Fix incorrect register offset definition in Hisilicon DDRC PMU
     driver

   - Fix compilation issue with older binutils not liking unsigned
     immediates

   - Fix uapi headers so that libc can provide its own sigcontext
     definition

   - Fix handling of private compat syscalls

   - Hook up compat io_pgetevents() syscall for 32-bit tasks

   - Cleanup to arm64 Makefile (including now to avoid silly conflicts)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: compat: Hook up io_pgetevents() for 32-bit tasks
  arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
  arm64: compat: Avoid sending SIGILL for unallocated syscall numbers
  arm64/sve: Disentangle <uapi/asm/ptrace.h> from <uapi/asm/sigcontext.h>
  arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
  drivers/perf: hisi: Fixup one DDRC PMU register offset
  arm64: replace arm64-obj-* in Makefile with obj-*
  arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
  firmware: arm_sdei: Fix DT platform device creation
  firmware: arm_sdei: fix wrong of_node_put() in init function
  arm64: entry: remove unused register aliases
  arm64: smp: Fix compilation error

Merge tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:
"Included in this update:

   - Florian Fainelli noticed that userspace segfaults caused by the
     lack of kernel-userspace helpers was hard to diagnose; we now issue
     a warning when userspace tries to use the helpers but the kernel
     has them disabled.

   - Ben Dooks wants compatibility for the old ATAG serial number with
     DT systems.

   - Some cleanup of assembly by Nicolas Pitre.

   - User accessors optimisation from Vincent Whitchurch.

   - More robust kdump on SMP systems from Yufen Wang.

   - Sebastian Andrzej Siewior noticed problems with the SMP "boot_lock"
     on RT kernels, and so we convert the Versatile series of platforms
     to use a raw spinlock instead, consolidating the Versatile
     implementation. We entirely remove the boot_lock on OMAP systems,
     where it's unnecessary. Further patches for other systems will be
     submitted for the following merge window.

   - Start switching old StrongARM-11x0 systems to use gpiolib rather
     than their private GPIO implementation - mostly PCMCIA bits.

   - ARM Kconfig cleanups.

   - Cleanup a mostly harmless mistake in the recent Spectre patch in
     4.20 (which had the effect that data that can be placed into the
     init sections was incorrectly always placed in the rodata section)"

* tag 'for-4.21' of git://git.armlinux.org.uk/~rmk/linux-arm: (25 commits)
  ARM: omap2: remove unnecessary boot_lock
  ARM: versatile: rename and comment SMP implementation
  ARM: versatile: convert boot_lock to raw
  ARM: vexpress/realview: consolidate immitation CPU hotplug
  ARM: fix the cockup in the previous patch
  ARM: sa1100/cerf: switch to using gpio_led_register_device()
  ARM: sa1100/assabet: switch to using gpio leds
  ARM: sa1100/assabet: add gpio keys support for right-hand two buttons
  ARM: sa1111: remove legacy GPIO interfaces
  pcmcia: sa1100*: remove redundant bvd1/bvd2 setting
  ARM: pxa/lubbock: switch PCMCIA to MAX1600 library
  ARM: pxa/mainstone: switch PCMCIA to MAX1600 library and gpiod APIs
  ARM: sa1100/neponset: switch PCMCIA to MAX1600 library and gpiod APIs
  ARM: sa1100/jornada720: switch PCMCIA to gpiod APIs
  pcmcia: add MAX1600 library
  ARM: sa1100: explicitly register sa11x0-pcmcia devices
  ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
  ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
  ARM: 8811/1: always list both ldrd/strd registers explicitly
  ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
  ...

Merge tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linux

Pull arch/csky updates from Guo Ren:
"Here are three main features (cpu_hotplug, basic ftrace, basic perf)
  and some bugfixes:

  Features:
   - Add CPU-hotplug support for SMP
   - Add ftrace with function trace and function graph trace
   - Add Perf support
   - Add EM_CSKY_OLD 39
   - optimize kernel panic print.
   - remove syscall_exit_work

  Bugfixes:
   - fix abiv2 mmap(... O_SYNC) failure
   - fix gdb coredump error
   - remove vdsp implement for kernel
   - fix qemu failure to bootup sometimes
   - fix ftrace call-graph panic
   - fix device tree node reference leak
   - remove meaningless header-y
   - fix save hi,lo,dspcr regs in switch_stack
   - remove unused members in processor.h"

* tag 'csky-for-linus-4.21' of git://github.com/c-sky/csky-linux:
  csky: Add perf support for C-SKY
  csky: Add EM_CSKY_OLD 39
  clocksource/drivers/c-sky: fixup ftrace call-graph panic
  csky: ftrace call graph supported.
  csky: basic ftrace supported
  csky: remove unused members in processor.h
  csky: optimize kernel panic print.
  csky: stacktrace supported.
  csky: CPU-hotplug supported for SMP
  clocksource/drivers/c-sky: fixup qemu fail to bootup sometimes.
  csky: fixup save hi,lo,dspcr regs in switch_stack.
  csky: remove syscall_exit_work
  csky: fixup remove vdsp implement for kernel.
  csky: bugfix gdb coredump error.
  csky: fixup abiv2 mmap(... O_SYNC) failed.
  csky: define syscall_get_arch()
  elf-em.h: add EM_CSKY
  csky: remove meaningless header-y
  csky: Don't leak device tree node reference

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

- procfs updates

- various misc bits

- lib/ updates

- epoll updates

- autofs

- fatfs

- a few more MM bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
  mm/page_io.c: fix polled swap page in
  checkpatch: add Co-developed-by to signature tags
  docs: fix Co-Developed-by docs
  drivers/base/platform.c: kmemleak ignore a known leak
  fs: don't open code lru_to_page()
  fs/: remove caller signal_pending branch predictions
  mm/: remove caller signal_pending branch predictions
  arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
  kernel/sched/: remove caller signal_pending branch predictions
  kernel/locking/mutex.c: remove caller signal_pending branch predictions
  mm: select HAVE_MOVE_PMD on x86 for faster mremap
  mm: speed up mremap by 20x on large regions
  mm: treewide: remove unused address argument from pte_alloc functions
  initramfs: cleanup incomplete rootfs
  scripts/gdb: fix lx-version string output
  kernel/kcov.c: mark write_comp_data() as notrace
  kernel/sysctl: add panic_print into sysctl
  panic: add options to print system info when panic happens
  bfs: extra sanity checking and static inode bitmap
  exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
  ...

ia64: fix compile without swiotlb

Some non-generic ia64 configs don't build swiotlb, and thus should not
pull in the generic non-coherent DMA infrastructure.

Fixes: 68c608345c ("swiotlb: remove dma_mark_clean")
Reported-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: re-introduce non-generic memcpy_{to,from}io

This has been broken forever, and nobody ever really noticed because
it's purely a performance issue.

Long long ago, in commit 6175ddf06b61 ("x86: Clean up mem*io functions")
Brian Gerst simplified the memory copies to and from iomem, since on
x86, the instructions to access iomem are exactly the same as the
regular instructions.

That is technically true, and things worked, and nobody said anything.
Besides, back then the regular memcpy was pretty simple and worked fine.

Nobody noticed except for David Laight, that is.  David has a testing a
TLP monitor he was writing for an FPGA, and has been occasionally
complaining about how memcpy_toio() writes things one byte at a time.

Which is completely unacceptable from a performance standpoint, even if
it happens to technically work.

The reason it's writing one byte at a time is because while it's
technically true that accesses to iomem are the same as accesses to
regular memory on x86, the _granularity_ (and ordering) of accesses
matter to iomem in ways that they don't matter to regular cached memory.

In particular, when ERMS is set, we default to using "rep movsb" for
larger memory copies.  That is indeed perfectly fine for real memory,
since the whole point is that the CPU is going to do cacheline
optimizations and executes the memory copy efficiently for cached
memory.

With iomem? Not so much.  With iomem, "rep movsb" will indeed work, but
it will copy things one byte at a time. Slowly and ponderously.

Now, originally, back in 2010 when commit 6175ddf06b61 was done, we
didn't use ERMS, and this was much less noticeable.

Our normal memcpy() was simpler in other ways too.

Because in fact, it's not just about using the string instructions.  Our
memcpy() these days does things like "read and write overlapping values"
to handle the last bytes of the copy.  Again, for normal memory,
overlapping accesses isn't an issue.  For iomem? It can be.

So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
memset_io() functions.  It doesn't particularly optimize them, but it
tries to at least not be horrid, or do overlapping accesses.  In fact,
this uses the existing __inline_memcpy() function that we still had
lying around that uses our very traditional "rep movsl" loop followed by
movsw/movsb for the final bytes.

Somebody may decide to try to improve on it, but if we've gone almost a
decade with only one person really ever noticing and complaining, maybe
it's not worth worrying about further, once it's not _completely_ broken?

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Use __put_user_goto in __put_user_size() and unsafe_put_user()

This actually enables the __put_user_goto() functionality in
unsafe_put_user().

For an example of the effect of this, this is the code generated for the

        unsafe_put_user(signo, &infop->si_signo, Efault);

in the waitid() system call:

movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]

It's just one single store instruction, along with generating an
exception table entry pointing to the Efault label case in case that
instruction faults.

Before, we would generate this:

xorl    %edx, %edx
movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
        testl   %edx, %edx
        jne     .L309

with the exception table generated for that 'mov' instruction causing us
to jump to a stub that set %edx to -EFAULT and then jumped back to the
'testl' instruction.

So not only do we now get rid of the extra code in the normal sequence,
we also avoid unnecessarily keeping that extra error register live
across it all.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86 uaccess: Introduce __put_user_goto

This is finally the actual reason for the odd error handling in the
"unsafe_get/put_user()" functions, introduced over three years ago.

Using a "jump to error label" interface is somewhat odd, but very
convenient as a programming interface, and more importantly, it fits
very well with simply making the target be the exception handler address
directly from the inline asm.

The reason it took over three years to actually do this? We need "asm
goto" support for it, which only became the default on x86 last year.
It's now been a year that we've forced asm goto support (see commit
e501ce957a78 "x86: Force asm-goto"), and so let's just do it here too.

[ Side note: this commit was originally done back in 2016. The above
  commentary about timing is obviously about it only now getting merged
  into my real upstream tree     - Linus ]

Sadly, gcc still only supports "asm goto" with asms that do not have any
outputs, so we are limited to only the put_user case for this.  Maybe in
several more years we can do the get_user case too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

parisc: Remap hugepage-aligned pages in set_kernel_text_rw()

The alternative coding patch for parisc in kernel 4.20 broke booting
machines with PA8500-PA8700 CPUs. The problem is, that for such machines
the parisc kernel automatically utilizes huge pages to access kernel
text code, but the set_kernel_text_rw() function, which is used shortly
before applying any alternative patches, didn't used the correctly
hugepage-aligned addresses to remap the kernel text read-writeable.

Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure")
Cc: <stable@vger.kernel.org> [4.20]
Signed-off-by: Helge Deller <deller@gmx.de>

Merge branch 'next/drivers' into next/late

Merge in a few missing patches from the pull request (my copy of the
branch was behind the staged version in linux-next).

* next/drivers:
  memory: pl353: Add driver for arm pl353 static memory controller
  dt-bindings: memory: Add pl353 smc controller devicetree binding information
  firmware: qcom: scm: fix compilation error when disabled

Signed-off-by: Olof Johansson <olof@lixom.net>

ARM: multi_v7_defconfig: enable CONFIG_UNIPHIER_MDMAC

Enable the UniPhier MIO DMAC driver. This is used as the DMA engine
for accelerating the SD/eMMC controller drivers.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

mm/page_io.c: fix polled swap page in

swap_readpage() wants to do polling to bring in pages if asked to, but
it doesn't mark the bio as being polled. Additionally, the looping
around the blk_poll() check isn't correct - if we get a zero return, we
should call io_schedule(), we can't just assume that the bio has
completed. The regular bio->bi_private check should be used for that.

Link: http://lkml.kernel.org/r/e15243a8-2cdf-c32c-ecee-f289377c8ef9@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

checkpatch: add Co-developed-by to signature tags

As per Documentation/process/submitting-patches, Co-developed-by is a
valid signature.

This commit removes the warning.

Link: http://lkml.kernel.org/r/1544808928-20002-3-git-send-email-jorge.ramirez-ortiz@linaro.org
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Jha <himanshujha199640@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

docs: fix Co-Developed-by docs

The accepted terminology will be Co-developed-by therefore lose the
capital letter from now on.

Link: http://lkml.kernel.org/r/1544808928-20002-2-git-send-email-jorge.ramirez-ortiz@linaro.org
Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Acked-by: Himanshu Jha <himanshujha199640@gmail.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/base/platform.c: kmemleak ignore a known leak

unreferenced object 0xffff808ec6dc5a80 (size 128):
  comm "swapper/0", pid 1, jiffies 4294938063 (age 2560.530s)
  hex dump (first 32 bytes):
    ff ff ff ff 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b  ........kkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  backtrace:
    [<00000000476dcf8c>] kmem_cache_alloc_trace+0x430/0x500
    [<000000004f708d37>] platform_device_register_full+0xbc/0x1e8
    [<000000006c2a7ec7>] acpi_create_platform_device+0x370/0x450
    [<00000000ef135642>] acpi_default_enumeration+0x34/0x78
    [<000000003bd9a052>] acpi_bus_attach+0x2dc/0x3e0
    [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0
    [<000000003cf4f7f2>] acpi_bus_attach+0x108/0x3e0
    [<000000002968643e>] acpi_bus_scan+0xb0/0x110
    [<0000000010dd0bd7>] acpi_scan_init+0x1a8/0x410
    [<00000000965b3c5a>] acpi_init+0x408/0x49c
    [<00000000ed4b9fe2>] do_one_initcall+0x178/0x7f4
    [<00000000a5ac5a74>] kernel_init_freeable+0x9d4/0xa9c
    [<0000000070ea6c15>] kernel_init+0x18/0x138
    [<00000000fb8fff06>] ret_from_fork+0x10/0x1c
    [<0000000041273a0d>] 0xffffffffffffffff

Then, faddr2line pointed out this line,

/*
* This memory isn't freed when the device is put,
* I don't have a nice idea for that though.  Conceptually
* dma_mask in struct device should not be a pointer.
* See http://thread.gmane.org/gmane.linux.kernel.pci/9081
*/
pdev->dev.dma_mask =
kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);

Since this leak has existed for more than 8 years and it does not
reference other parts of the memory, let kmemleak ignore it, so users
don't need to waste time reporting this in the future.

Link: http://lkml.kernel.org/r/20181206160751.36211-1-cai@gmx.us
Signed-off-by: Qian Cai <cai@gmx.us>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs: don't open code lru_to_page()

Multiple filesystems open code lru_to_page(). Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.

No functional changes.

Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com> [ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/: remove caller signal_pending branch predictions

This is already done for us internally by the signal machinery.

[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/: remove caller signal_pending branch predictions

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

arch/arc/mm/fault.c: remove caller signal_pending_branch predictions

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/sched/: remove caller signal_pending branch predictions

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/locking/mutex.c: remove caller signal_pending branch predictions

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: select HAVE_MOVE_PMD on x86 for faster mremap

Moving page-tables at the PMD-level on x86 is known to be safe. Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: speed up mremap by 20x on large regions

Android needs to mremap large regions of memory during memory management
related operations.  The mremap system call can be really slow if THP is
not enabled.  The bottleneck is move_page_tables, which is copying each
pte at a time, and can be really slow across a large map.  Turning on
THP may not be a viable option, and is not for us.  This patch speeds up
the performance for non-THP system by copying at the PMD level when
possible.

The speedup is an order of magnitude on x86 (~20x).  On a 1GB mremap,
the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
microseconds.

Before:
Total mremap time for 1GB data: 3521942 nanoseconds.
Total mremap time for 1GB data: 3449229 nanoseconds.
Total mremap time for 1GB data: 3488230 nanoseconds.

After:
Total mremap time for 1GB data: 150279 nanoseconds.
Total mremap time for 1GB data: 144665 nanoseconds.
Total mremap time for 1GB data: 158708 nanoseconds.

If THP is enabled the optimization is mostly skipped except in certain
situations.

[joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning]
Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com
Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: treewide: remove unused address argument from pte_alloc functions

Patch series "Add support for fast mremap".

This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems.  There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work.  Also we find that there is no point in passing the 'address' to
pte_alloc since its unused.  This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well.  Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.

Build and boot tested on x86-64.  Build tested on arm64.  The config
enablement patch for arm64 will be posted in the future after more
testing.

The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from  pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.

// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.

virtual patch

@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@

fn(...
- , T2 E2
)
{ ... }

@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)

@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)

@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

fn(...
-,  E2
)

@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@

(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)

Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

initramfs: cleanup incomplete rootfs

Unpacking an external initrd may fail e.g. not enough memory. This
leads to an incomplete rootfs because some files might be extracted
already. Fixed by cleaning the rootfs so the kernel is not using an
incomplete rootfs.

Link: http://lkml.kernel.org/r/20181030151805.5519-1-david.engraf@sysgo.com
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scripts/gdb: fix lx-version string output

A bug is present in GDB which causes early string termination when
parsing variables.  This has been reported [0], but we should ensure
that we can support at least basic printing of the core kernel strings.

For current gdb version (has been tested with 7.3 and 8.1), 'lx-version'
only prints one character.

  (gdb) lx-version
  L(gdb)

This can be fixed by casting 'linux_banner' as (char *).

  (gdb) lx-version
  Linux version 4.19.0-rc1+ (changbin@acer) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #21 SMP Sat Sep 1 21:43:30 CST 2018

[0] https://sourceware.org/bugzilla/show_bug.cgi?id=20077

[kbingham@kernel.org: add detail to commit message]
Link: http://lkml.kernel.org/r/20181111162035.8356-1-kieran.bingham@ideasonboard.com
Fixes: 2d061d999424 ("scripts/gdb: add version command")
Signed-off-by: Du Changbin <changbin.du@gmail.com>
Signed-off-by: Kieran Bingham <kbingham@kernel.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kcov.c: mark write_comp_data() as notrace

Since __sanitizer_cov_trace_const_cmp4 is marked as notrace, the
function called from __sanitizer_cov_trace_const_cmp4 shouldn't be
traceable either.  ftrace_graph_caller() gets called every time func
write_comp_data() gets called if it isn't marked 'notrace'.  This is the
backtrace from gdb:

#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
#1  0xffffff8010201920 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
#2  0xffffff8010439714 in write_comp_data (type=5, arg1=0, arg2=0, ip=18446743524224276596) at ../kernel/kcov.c:116
#3  0xffffff8010439894 in __sanitizer_cov_trace_const_cmp4 (arg1=<optimized out>, arg2=<optimized out>) at ../kernel/kcov.c:188
#4  0xffffff8010201874 in prepare_ftrace_return (self_addr=18446743524226602768, parent=0xffffff801014b918, frame_pointer=18446743524223531344) at ./include/generated/atomic-instrumented.h:27
#5  0xffffff801020194c in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182

Rework so that write_comp_data() that are called from
__sanitizer_cov_trace_*_cmp*() are marked as 'notrace'.

Commit 903e8ff86753 ("kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace")
missed to mark write_comp_data() as 'notrace'. When that patch was
created gcc-7 was used. In lib/Kconfig.debug
config KCOV_ENABLE_COMPARISONS
depends on $(cc-option,-fsanitize-coverage=trace-cmp)

That code path isn't hit with gcc-7. However, it were that with gcc-8.

Link: http://lkml.kernel.org/r/20181206143011.23719-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/sysctl: add panic_print into sysctl

So that we can also runtime chose to print out the needed system info
for panic, other than setting the kernel cmdline.

Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

panic: add options to print system info when panic happens

Kernel panic issues are always painful to debug, partially because it's
not easy to get enough information of the context when panic happens.

And we have ramoops and kdump for that, while this commit tries to
provide a easier way to show the system info by adding a cmdline
parameter, referring some idea from sysrq handler.

Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bfs: extra sanity checking and static inode bitmap

Strengthen validation of BFS superblock against corruption.  Make
in-core inode bitmap static part of superblock info structure.  Print a
warning when mounting a BFS filesystem created with "-N 512" option as
only 510 files can be created in the root directory.  Make the kernel
messages more uniform.  Update the 'prefix' passed to bfs_dump_imap() to
match the current naming of operations.  White space and comments
cleanup.

Link: http://lkml.kernel.org/r/CAK+_RLkFZMduoQF36wZFd3zLi-6ZutWKsydjeHFNdtRvZZEb4w@mail.gmail.com
Signed-off-by: Tigran Aivazian <aivazian.tigran@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

exec: separate MM_ANONPAGES and RLIMIT_STACK accounting

get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the
"extra" size for argv/envp pointers every time, this is a bit ugly and
even not strictly correct: acct_arg_size() must not account this size.

Remove all the rlimit code in get_arg_page(). Instead, add bprm->argmin
calculated once at the start of __do_execve_file() and change
copy_strings to check bprm->p >= bprm->argmin.

The patch adds the new helper, prepare_arg_pages() which initializes
bprm->argc/envc and bprm->argmin.

[oleg@redhat.com: fix !CONFIG_MMU version of get_arg_page()]
Link: http://lkml.kernel.org/r/20181126122307.GA1660@redhat.com
[akpm@linux-foundation.org: use max_t]
Link: http://lkml.kernel.org/r/20181112160910.GA28440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

exec: load_script: don't blindly truncate shebang string

load_script() simply truncates bprm->buf and this is very wrong if the
length of shebang string exceeds BINPRM_BUF_SIZE-2. This can silently
truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
happens to be the valid executable path.

Change load_script() to return ENOEXEC if it can't find '\n' or zero in
bprm->buf. Note that '\0' can come from either
prepare_binprm()->memset() or from kernel_read(), we do not care.

Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fork: fix some -Wmissing-prototypes warnings

We get a warning when building kernel with W=1:

kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

Add the missing declaration in head file to fix this.

Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d81264 (arch: remove tile port).

Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fat: new inline functions to determine the FAT variant (32, 16 or 12)

This patch introduces 3 new inline functions - is_fat12, is_fat16 and
is_fat32, and replaces every occurrence in the code in which the FS
variant (whether this is FAT12, FAT16 or FAT32) was previously checked
using msdos_sb_info->fat_bits.

Link: http://lkml.kernel.org/r/1544990640-11604-4-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fat: move MAX_FAT to fat.h and change it to inline function

MAX_FAT is useless in msdos_fs.h, since it uses the MSDOS_SB function
that is defined in fat.h.  So really, this macro can be only called from
code that already includes fat.h.

Hence, this patch moves it to fat.h, right after MSDOS_SB is defined.  I
also changed it to an inline function in order to save the double call
to MSDOS_SB.  This was suggested by joe@perches.com in the previous
version.

This patch is required for the next in the series, in which the variant
(whether this is FAT12, FAT16 or FAT32) checks are replaced with new
macros.

Link: http://lkml.kernel.org/r/1544990640-11604-3-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fat: remove FAT_FIRST_ENT macro

The comment edited in this patch was the only reference to the
FAT_FIRST_ENT macro, which is not used anymore. Moreover, the commented
line of code does not compile with the current code.

Since the FAT_FIRST_ENT macro checks the FAT variant in a way that the
patch series changes, I removed it, and instead wrote a clear
explanation of what was checked.

I verified that the changed comment is correct according to Microsoft
FAT spec, search for "BPB_Media" in the following references:

1. Microsoft FAT specification 2005
(http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf).
Search for 'volume label'.
2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification
(https://staff.washington.edu/dittrich/misc/fatgen103.pdf).
Search for 'volume label'.

Link: http://lkml.kernel.org/r/1544990640-11604-2-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/uapi/linux/msdos_fs.h: use MSDOS_NAME for volume label size

The FAT file system volume label file stored in the root directory
should match the volume label field in the FAT boot sector. As
consequence, the max length of these fields ought to be the same. This
patch replaces the magic '11' usef in the struct fat_boot_sector with
MSDOS_NAME, which is used in struct msdos_dir_entry.

Please check the following references:
1. Microsoft FAT specification 2005
(http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf).
Search for 'volume label'.
2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification
(https://staff.washington.edu/dittrich/misc/fatgen103.pdf).
Search for 'volume label'.
3. User space code that creates FAT filesystem
sometimes uses MSDOS_NAME for the label, sometimes not.
Search for 'if (memcmp(label, NO_NAME, MSDOS_NAME))'.
I consider to make the same patch there as well.
https://github.com/dosfstools/dosfstools/blob/master/src/mkfs.fat.c

Link: http://lkml.kernel.org/r/1543096879-82837-1-git-send-email-carmeli.tamir@gmail.com
Signed-off-by: Carmeli Tamir <carmeli.tamir@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hfsplus: return file attributes on statx

The immutable, append-only and no-dump attributes can only be retrieved
with an ioctl; implement the ->getattr() method to return them on statx.
Do not return the inode birthtime yet, because the issue of how best to
handle the post-2038 timestamps is still under discussion.

This patch is needed to pass xfstests generic/424.

Link: http://lkml.kernel.org/r/20181014163558.sxorxlzjqccq2lpw@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

autofs: add strictexpire mount option

Commit 092a53452bb7 ("autofs: take more care to not update last_used on
path walk") helped to (partially) resolve a problem where automounts
were not expiring due to aggressive accesses from user space.

This patch was later reverted because, for very large environments, it
meant more mount requests from clients and when there are a lot of
clients this caused a fairly significant increase in server load.

But there is a need for both types of expire check, depending on use
case, so add a mount option to allow for strict update of last use of
autofs dentrys (which just means not updating the last use on path walk
access).

Link: http://lkml.kernel.org/r/154296973880.9889.14085372741514507967.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

autofs: change catatonic setting to a bit flag

Change the superblock info. catatonic setting to be part of a flags bit
field.

Link: http://lkml.kernel.org/r/154296973142.9889.17275721668508589639.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

autofs: simplify parse_options() function call

The parse_options() function uses a long list of parameters, most of
which are present in the super block info structure already.

The mount parameters set in parse_options() options don't require
cleanup so using the super block info struct directly is simpler.

Link: http://lkml.kernel.org/r/154296972423.9889.9368859245676473329.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

autofs: improve ioctl sbi checks

Al Viro made some suggestions to improve the implementation of commit
0633da48f0 ("fix autofs_sbi() does not check super block type").

The check is unnecessary in all cases except for ioctl usage so placing
the check in the super block accessor function adds a small overhead to
the common case where it isn't needed.

So it's sufficient to do this in the ioctl code only.

Also the check in the ioctl code is needlessly complex.

[akpm@linux-foundation.org: declare autofs_fs_type in .h, not .c]
Link: http://lkml.kernel.org/r/154296970987.9889.1597442413573683096.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

init/main.c: make "initcall_level_names[]" const char *

Initcall names should not be changed.

Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: deal with wait_queue only once

There is no reason why we rearm the waitiqueue upon every fetch_events
retry (for when events are found yet send_events() fails). If nothing
else, this saves four lock operations per retry, and furthermore reduces
the scope of the lock even further.

[akpm@linux-foundation.org: restore code to original position, fix and reflow comment]
Link: http://lkml.kernel.org/r/20181114182532.27981-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: rename check_events label to send_events

It is currently called check_events because it, well, did exactly that.
However, since the lockless ep_events_available() call, the label no
longer checks, but just sends the events. Rename as such.

Link: http://lkml.kernel.org/r/20181114182532.27981-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: avoid barrier after an epoll_wait(2) timeout

Upon timeout, we can just exit out of the loop, without the cost of the
changing the task's state with an smp_store_mb call. Just exit out of
the loop and be done - setting the task state afterwards will be, of
course, redundant.

[dave@stgolabs.net: forgotten fixlets]
Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: reduce the scope of wq lock in epoll_wait()

This patch aims at reducing ep wq.lock hold times in epoll_wait(2).  For
the blocking case, there is no need to constantly take and drop the
spinlock, which is only needed to manipulate the waitqueue.

The call to ep_events_available() is now lockless, and only exposed to
benign races.  Here, if false positive (returns available events and
does not see another thread deleting an epi from the list) we call into
send_events and then the list's state is correctly seen.  Otoh, if a
false negative and we don't see a list_add_tail(), for example, from irq
callback, then it is rechecked again before blocking, which will see the
correct state.

In order for more accuracy to see concurrent list_del_init(), use the
list_empty_careful() variant -- of course, this won't be safe against
insertions from wakeup.

For the overflow list we obviously need to prevent load/store tearing as
we don't want to see partial values while the ready list is disabled.

[dave@stgolabs.net: forgotten fixlets]
Link: http://lkml.kernel.org/r/20181109155258.jxcr4t2pnz6zqct3@linux-r8p5
Link: http://lkml.kernel.org/r/20181108051006.18751-6-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: robustify ep->mtx held checks

Insted of just commenting how important it is, lets make it more robust
and add a lockdep_assert_held() call.

Link: http://lkml.kernel.org/r/20181108051006.18751-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: drop ovflist branch prediction

The ep->ovflist is a secondary ready-list to temporarily store events
that might occur when doing sproc without holding the ep->wq.lock. This
accounts for every time we check for ready events and also send events
back to userspace; both callbacks, particularly the latter because of
copy_to_user, can account for a non-trivial time.

As such, the unlikely() check to see if the pointer is being used, seems
both misleading and sub-optimal. In fact, we go to an awful lot of
trouble to sync both lists, and populating the ovflist is far from an
uncommon scenario.

For example, profiling a concurrent epoll_wait(2) benchmark, with
CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
incorrect rate was seen; and when incrementally increasing the number of
epoll instances (which is used, for example for multiple queuing load
balancing models), up to a 90% incorrect rate was seen.

Similarly, by deleting the prediction, 3% throughput boost was seen
across incremental threads.

Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: simplify ep_send_events_proc() ready-list loop

The current logic is a bit convoluted. Lets simplify this with a
standard list_for_each_entry_safe() loop instead and just break out
after maxevents is reached.

While at it, remove an unnecessary indentation level in the loop when
there are in fact ready events.

Link: http://lkml.kernel.org/r/20181108051006.18751-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/epoll: remove max_nests argument from ep_call_nested()

Patch series "epoll: some miscellaneous optimizations".

The following are some incremental optimizations on some of the epoll
core. Each patch has the details, but together, the series is seen to
shave off measurable cycles on a number of systems and workloads.

For example, on a 40-core IB, a pipetest as well as parallel
epoll_wait() benchmark show around a 20-30% increase in raw operations
per second when the box is fully occupied (incremental thread counts),
and up to 15% performance improvement with lower counts.

Passes ltp epoll related testcases.

This patch(of 6):

All callers pass the EP_MAX_NESTS constant already, so lets simplify
this a tad and get rid of the redundant parameter for nested eventpolls.

Link: http://lkml.kernel.org/r/20181108051006.18751-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

checkpatch: warn on const char foo[] = "bar"; declarations

These declarations should generally be static const to avoid poor
compilation and runtime performance where compilers tend to initialize
the const declaration for every call instead of using .rodata for the
string.

Miscellanea:

- Convert spaces to tabs for indentation in 2 adjacent checks

Link: http://lkml.kernel.org/r/10ea5f4b087dc911e41e187a4a2b5e79c7529aa3.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/firmware/memmap.c: modify memblock_alloc to memblock_alloc_nopanic

memblock_alloc() never returns NULL because panic never returns.

Link: http://lkml.kernel.org/r/1545640882-42009-1-git-send-email-huang.zijiang@zte.com.cn
Signed-off-by: huang.zijiang <huang.zijiang@zte.com.cn>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/genalloc.c: use vzalloc_node() to allocate the bitmap

Some devices may have big memory on chip, such as over 1G. In some
cases, the nbytes maybe bigger then 4M which is the bounday of the
memory buddy system (4K default).

So use vzalloc_node() to allocate the bitmap. Also use vfree to free
it.

Link: http://lkml.kernel.org/r/20181225015701.6289-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Skidanov <alexey.skidanov@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/find_bit_benchmark.c: align test_find_next_and_bit with others

Contrary to other tests, test_find_next_and_bit() test uses tab
formatting in output and get_cycles() instead of ktime_get().
get_cycles() is not supported by some arches, so ktime_get() fits better
in generic code.

Fix it and minor style issues, so the output looks like this:

Start testing find_bit() with random-filled bitmap
find_next_bit:                 7142816 ns, 163282 iterations
find_next_zero_bit:            8545712 ns, 164399 iterations
find_last_bit:                 6332032 ns, 163282 iterations
find_first_bit:               20509424 ns,  16606 iterations
find_next_and_bit:             4060016 ns,  73424 iterations

Start testing find_bit() with sparse bitmap
find_next_bit:                   55984 ns,    656 iterations
find_next_zero_bit:           19197536 ns, 327025 iterations
find_last_bit:                   65088 ns,    656 iterations
find_first_bit:                5923712 ns,    656 iterations
find_next_and_bit:               29088 ns,      1 iterations

Link: http://lkml.kernel.org/r/20181123174803.10916-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Norov, Yuri" <Yuri.Norov@cavium.com>
Cc: Clement Courbet <courbet@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk

gen_pool_alloc_algo() uses different allocation functions implementing
different allocation algorithms.  With gen_pool_first_fit_align()
allocation function, the returned address should be aligned on the
requested boundary.

If chunk start address isn't aligned on the requested boundary, the
returned address isn't aligned too.  The only way to get properly
aligned address is to initialize the pool with chunks aligned on the
requested boundary.  If want to have an ability to allocate buffers
aligned on different boundaries (for example, 4K, 1MB, ...), the chunk
start address should be aligned on the max possible alignment.

This happens because gen_pool_first_fit_align() looks for properly
aligned memory block without taking into account the chunk start address
alignment.

To fix this, we provide chunk start address to
gen_pool_first_fit_align() and change its implementation such that it
starts looking for properly aligned block with appropriate offset
(exactly as is done in CMA).

Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com
Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.com
Signed-off-by: Alexey Skidanov <alexey.skidanov@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fls: change parameter to unsigned int

When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour. It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.

Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.

Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/printk.h: drop silly "static inline asmlinkage" from dump_stack()

Empty function will be inlined so asmlinkage doesn't do anything.

Link: http://lkml.kernel.org/r/20181124093530.GE10969@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joey Pabalinas <joeypabalinas@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/dma-buf/udmabuf.c: convert to use vm_fault_t

Use new return type vm_fault_t for fault handler.

Link: http://lkml.kernel.org/r/20181106173628.GA12989@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/hung_task.c: break RCU locks based on jiffies

check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
for every 1024 threads. But check_hung_task() is very slow if printk()
was called, and is very fast otherwise.

If many threads within some 1024 threads called printk(), the RCU grace
period might be extended enough to trigger RCU stall warnings.
Therefore, calling rcu_lock_break() for every some fixed jiffies will be
safer.

Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/hung_task.c: force console verbose before panic

Based on commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks
before panic"), we could get the call stack of hung task.

However, if the console loglevel is not high, we still can not see the
useful panic information in practice, and in most cases users don't set
console loglevel to high level.

This patch is to force console verbose before system panic, so that the
real useful information can be seen in the console, instead of being
like the following, which doesn't have hung task information.

  INFO: task init:1 blocked for more than 120 seconds.
        Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Kernel panic - not syncing: hung_task: blocked tasks
  CPU: 2 PID: 479 Comm: khungtaskd Tainted: G     U  W         4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
  Call Trace:
   dump_stack+0x4f/0x65
   panic+0xde/0x231
   watchdog+0x290/0x410
   kthread+0x12c/0x150
   ret_from_fork+0x35/0x40
  reboot: panic mode set: p,w
  Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

build_bug.h: remove most of dummy BUILD_BUG_ON stubs for Sparse

The introduction of these dummy BUILD_BUG_ON stubs dates back to commmit
903c0c7cdc21 ("sparse: define dummy BUILD_BUG_ON definition for
sparse").

At that time, BUILD_BUG_ON() was implemented with the negative array
trick *and* the link-time trick, like this:

  extern int __build_bug_on_failed;
  #define BUILD_BUG_ON(condition)                                \
          do {                                                   \
                  ((void)sizeof(char[1 - 2*!!(condition)]));     \
                  if (condition) __build_bug_on_failed = 1;      \
          } while(0)

Sparse is more strict about the negative array trick than GCC because
Sparse requires the array length to be really constant.

Here is the simple test code for the macro above:

  static const int x = 0;
  BUILD_BUG_ON(x);

GCC is absolutely fine with it (-Wvla was enabled only very recently),
but Sparse warns like this:

  error: bad constant expression
  error: cannot size expression

(If you are using a newer version of Sparse, you will see a different
warning message, "warning: Variable length array is used".)

Anyway, Sparse was producing many false positives, and noisier than it
should be at that time.

With the previous commit, the leftover negative array trick is gone.
Sparse is fine with the current BUILD_BUG_ON(), which is implemented by
using the 'error' attribute.

I am keeping the stub for BUILD_BUG_ON_ZERO().  Otherwise, Sparse would
complain about the following code, which GCC is fine with:

  static const int x = 0;
  int y = BUILD_BUG_ON_ZERO(x);

Link: http://lkml.kernel.org/r/1542856462-18836-3-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

build_bug.h: remove negative-array fallback for BUILD_BUG_ON()

The kernel can only be compiled with an optimization option (-O2, -Os,
or the currently proposed -Og). Hence, __OPTIMIZE__ is always defined
in the kernel source.

The fallback for the -O0 case is just hypothetical and pointless.
Moreover, commit 0bb95f80a38f ("Makefile: Globally enable VLA warning")
enabled -Wvla warning. The use of variable length arrays is banned.

Link: http://lkml.kernel.org/r/1542856462-18836-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation/process/coding-style.rst: don't use "extern" with function prototypes

`extern' with function prototypes makes lines longer and creates more
characters on the screen.

Do not bug people with checkpatch.pl warnings for now as fallout can be
devastating.

Link: http://lkml.kernel.org/r/20181101134153.GA29267@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc/sysctl: fix return error for proc_doulongvec_minmax()

If the number of input parameters is less than the total parameters, an
EINVAL error will be returned.

For example, we use proc_doulongvec_minmax to pass up to two parameters
with kern_table:

{
.procname       = "monitor_signals",
.data           = &monitor_sigs,
.maxlen         = 2*sizeof(unsigned long),
.mode           = 0644,
.proc_handler   = proc_doulongvec_minmax,
},

Reproduce:

When passing two parameters, it's work normal.  But passing only one
parameter, an error "Invalid argument"(EINVAL) is returned.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  -bash: echo: write error: Invalid argument
  [root@cl150 ~]# echo $?
  1
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

The following is the result after apply this patch.  No error is
returned when the number of input parameters is less than the total
parameters.

  [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  1       2
  [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
  [root@cl150 ~]# echo $?
  0
  [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
  3       2
  [root@cl150 ~]#

There are three processing functions dealing with digital parameters,
__do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.

This patch deals with __do_proc_doulongvec_minmax, just as
__do_proc_dointvec does, adding a check for parameters 'left'.  In
__do_proc_douintvec, its code implementation explicitly does not support
multiple inputs.

static int __do_proc_douintvec(...){
         ...
         /*
          * Arrays are not supported, keep this simple. *Do not* add
          * support for them.
          */
         if (vleft != 1) {
                 *lenp = 0;
                 return -EINVAL;
         }
         ...
}

So, just __do_proc_doulongvec_minmax has the problem.  And most use of
proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
parameter.

Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/proc/base.c: slightly faster /proc/*/limits

Header of /proc/*/limits is a fixed string, so print it directly without
formatting specifiers.

Link: http://lkml.kernel.org/r/20181203164242.GB6904@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/proc/inode.c: delete unnecessary variable in proc_alloc_inode()

Link: http://lkml.kernel.org/r/20181203164015.GA6904@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>