Linus Torvalds [Fri, 3 Apr 2020 21:44:48 +0000 (14:44 -0700)]
Merge tag 'for-5.7/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix excessive bio splitting that caused performance regressions
- Fix logic bug in DM integrity discard support's integrity tag testing
- Fix DM integrity warning on ppc64le due to missing cast
* tag 'for-5.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix logic bug in integrity tag testing
Revert "dm: always call blk_queue_split() in dm_process_bio()"
dm integrity: fix ppc64le warning
Linus Torvalds [Fri, 3 Apr 2020 21:25:02 +0000 (14:25 -0700)]
Merge tag 'pci-v5.7-changes' of git://git./linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)
- Add more 32 GT/s link speed decoding and improve the implementation
(Yicong Yang)
Resource management:
- Add support for sizing programmable host bridge apertures and fix a
related alpha Nautilus regression (Ivan Kokshaysky)
Interrupts:
- Add boot interrupt quirk mechanism for Xeon chipsets and document
boot interrupts (Sean V Kelley)
PCIe native device hotplug:
- When possible, disable in-band presence detect and use PDS
(Alexandru Gagniuc)
- Add DMI table for devices that don't use in-band presence detection
but don't advertise that correctly (Stuart Hayes)
- Fix hang when powering slots up/down via sysfs (Lukas Wunner)
- Fix an MSI interrupt race (Stuart Hayes)
Virtualization:
- Add ACS quirks for Zhaoxin devices (Raymond Pang)
Error handling:
- Add Error Disconnect Recover (EDR) support so firmware can report
devices disconnected via DPC and we can try to recover (Kuppuswamy
Sathyanarayanan)
Peer-to-peer DMA:
- Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
Maier)
ASPM:
- Reduce severity of common clock config message (Chris Packham)
- Clear the correct bits when enabling L1 substates, so we don't go
to the wrong state (Yicong Yang)
Endpoint framework:
- Replace EPF linkup ops with notifier call chain and improve locking
(Kishon Vijay Abraham I)
- Fix concurrent memory allocation in OB address region (Kishon Vijay
Abraham I)
- Move PF function number assignment to EPC core to support multiple
function creation methods (Kishon Vijay Abraham I)
- Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)
- Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
Offset (Kishon Vijay Abraham I)
- Add support for testing DMA transfers (Kishon Vijay Abraham I)
- Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)
- Add support for tests to clear IRQ (Kishon Vijay Abraham I)
- Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)
Amlogic Meson PCIe controller driver:
- Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
Pommarel)
- Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
Pommarel)
Cadence PCIe controller driver:
- Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
Abraham I)
Intel VMD host bridge driver:
- Add two VMD Device IDs that require bus restriction mode (Sushma
Kalakota)
Mobiveil PCIe controller driver:
- Refactor and modularize mobiveil driver (Hou Zhiqiang)
- Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)
Microsoft Hyper-V host bridge driver:
- Add support for Hyper-V PCI protocol version 1.3 and
PCI_BUS_RELATIONS2 (Long Li)
- Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
Feng)
- Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)
NVIDIA Tegra PCIe controller driver:
- Use pci_parse_request_of_pci_ranges() (Rob Herring)
- Add support for endpoint mode and related DT updates (Vidya Sagar)
- Reduce -EPROBE_DEFER error message log level (Thierry Reding)
Qualcomm PCIe controller driver:
- Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)
Synopsys DesignWare PCIe controller driver:
- Refactor core initialization code for endpoint mode (Vidya Sagar)
- Fix endpoint MSI-X to use correct table address (Kishon Vijay
Abraham I)
TI DRA7xx PCIe controller driver:
- Fix MSI IRQ handling (Vignesh Raghavendra)
TI Keystone PCIe controller driver:
- Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)
Miscellaneous:
- Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
Feng)
- Use ioremap(), not phys_to_virt(), for platform ROM to fix video
ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"
* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI: tegra: Print -EPROBE_DEFER error message at debug level
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
tools: PCI: Add 'e' to clear IRQ
misc: pci_endpoint_test: Add ioctl to clear IRQ
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
misc: pci_endpoint_test: Add support to get DMA option from userspace
tools: PCI: Add 'd' command line option to support DMA
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
PCI: endpoint: functions/pci-epf-test: Print throughput information
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
PCI: pciehp: Fix MSI interrupt race
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI: endpoint: Fix clearing start entry in configfs
PCI: tegra: Add support for PCIe endpoint mode in Tegra194
PCI: sysfs: Revert "rescan" file renames
...
Linus Torvalds [Fri, 3 Apr 2020 20:22:40 +0000 (13:22 -0700)]
Merge tag 'char-misc-5.7-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char/misc/other driver patches for 5.7-rc1.
Lots of things in here, and it's later than expected due to some
reverts to resolve some reported issues. All is now clean with no
reported problems in linux-next.
Included in here is:
- interconnect updates
- mei driver updates
- uio updates
- nvmem driver updates
- soundwire updates
- binderfs updates
- coresight updates
- habanalabs updates
- mhi new bus type and core
- extcon driver updates
- some Kconfig cleanups
- other small misc driver cleanups and updates
As mentioned, all have been in linux-next for a while, and with the
last two reverts, all is calm and good"
* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
Revert "driver core: platform: Initialize dma_parms for platform devices"
Revert "amba: Initialize dma_parms for amba devices"
amba: Initialize dma_parms for amba devices
driver core: platform: Initialize dma_parms for platform devices
bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
bus: mhi: core: Add support for reading MHI info from device
misc: rtsx: set correct pcr_ops for rts522A
speakup: misc: Use dynamic minor numbers for speakup devices
mei: me: add cedar fork device ids
coresight: do not use the BIT() macro in the UAPI header
Documentation: provide IBM contacts for embargoed hardware
nvmem: core: remove nvmem_sysfs_get_groups()
nvmem: core: use is_bin_visible for permissions
nvmem: core: use device_register and device_unregister
nvmem: core: add root_only member to nvmem device struct
extcon: axp288: Add wakeup support
extcon: Mark extcon_get_edev_name() function as exported symbol
extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
...
Linus Torvalds [Fri, 3 Apr 2020 20:12:26 +0000 (13:12 -0700)]
Merge tag 'spdx-5.7-rc1' of git://git./linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Linus Torvalds [Fri, 3 Apr 2020 19:51:46 +0000 (12:51 -0700)]
Merge tag 'for-linus-5.7-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a cleanup patch removing an unused function
- a small fix for the xen pciback driver
- a series for making the unwinder hyppay with the Xen PV guest idle
task stacks
* tag 'for-linus-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Make the secondary CPU idle tasks reliable
x86/xen: Make the boot CPU idle task reliable
xen-pciback: fix INTERRUPT_TYPE_* defines
xen/xenbus: remove unused xenbus_map_ring()
Linus Torvalds [Fri, 3 Apr 2020 19:27:36 +0000 (12:27 -0700)]
Merge branch 'for-5.7' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
"Nothing too interesting. Just two trivial patches"
* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Mark up unlocked access to wq->first_flusher
workqueue: Make workqueue_init*() return void
Linus Torvalds [Fri, 3 Apr 2020 18:30:20 +0000 (11:30 -0700)]
Merge branch 'for-5.7' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Christian extended clone3 so that processes can be spawned into
cgroups directly.
This is not only neat in terms of semantics but also avoids grabbing
the global cgroup_threadgroup_rwsem for migration.
- Daniel added !root xattr support to cgroupfs.
Userland already uses xattrs on cgroupfs for bookkeeping. This will
allow delegated cgroups to support such usages.
- Prateek tried to make cpuset hotplug handling synchronous but that
led to possible deadlock scenarios. Reverted.
- Other minor changes including release_agent_path handling cleanup.
* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Document the cpuset_v2_mode mount option
Revert "cpuset: Make cpuset hotplug synchronous"
cgroupfs: Support user xattrs
kernfs: Add option to enable user xattrs
kernfs: Add removed_size out param for simple_xattr_set
kernfs: kvmalloc xattr value instead of kmalloc
cgroup: Restructure release_agent_path handling
selftests/cgroup: add tests for cloning into cgroups
clone3: allow spawning processes into cgroups
cgroup: add cgroup_may_write() helper
cgroup: refactor fork helpers
cgroup: add cgroup_get_from_file() helper
cgroup: unify attach permission checking
cpuset: Make cpuset hotplug synchronous
cgroup.c: Use built-in RCU list checking
kselftest/cgroup: add cgroup destruction test
cgroup: Clean up css_set task traversal
Linus Torvalds [Fri, 3 Apr 2020 18:26:32 +0000 (11:26 -0700)]
Merge tag 'kgdb-5.7-rc1' of git://git./linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Pretty quiet this cycle. Just a couple of small fixes from myself both
of which were reviewed by Doug Anderson to keep me honest (thanks)"
* tag 'kgdb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Censor attempts to set PROMPT without ENABLE_MEM_READ
kdb: Eliminate strncpy() warnings by replacing with strscpy()
Linus Torvalds [Fri, 3 Apr 2020 18:20:16 +0000 (11:20 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
- replace setup_irq() with request_irq() for ebsa110, footbridge, rpc
- fix clang assembly error in kexec code
- remove .fixup section in boot stub
- decompressor / EFI cache flushing updates
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8966/1: rpc: replace setup_irq() by request_irq()
ARM: 8965/2: footbridge: replace setup_irq() by request_irq()
ARM: 8964/1: ebsa110: replace setup_irq() by request_irq()
ARM: 8962/1: kexec: drop invalid assembly argument
ARM: decompressor: switch to by-VA cache maintenance for v7 cores
ARM: decompressor: prepare cache_clean_flush for doing by-VA maintenance
ARM: decompressor: factor out routine to obtain the inflated image size
ARM: 8959/1: Remove unused .fixup section in boot stub
ARM: allow unwinder to unwind recursive functions
Nathan Chancellor [Fri, 3 Apr 2020 01:31:35 +0000 (18:31 -0700)]
remoteproc/omap: Fix set_load call in omap_rproc_request_timer
When building arm allyesconfig:
drivers/remoteproc/omap_remoteproc.c:174:44: error: too many arguments
to function call, expected 2, have 3
timer->timer_ops->set_load(timer->odt, 0, 0);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
1 error generated.
This is due to commit
02e6d546e3bd ("clocksource/drivers/timer-ti-dm:
Enable autoreload in set_pwm") in the clockevents tree interacting with
commit
e28edc571925 ("remoteproc/omap: Request a timer(s) for remoteproc
usage") from the rpmsg tree.
This should have been fixed during the merge of the remoteproc tree
since it happened after the clockevents tree merge; however, it does not
look like my email was noticed by either maintainer and I did not pay
attention when the pull was sent since I was on CC.
Fixes: c6570114316f ("Merge tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc")
Link: https://lore.kernel.org/lkml/20200327185055.GA22438@ubuntu-m2-xlarge-x86/
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Fri, 3 Apr 2020 17:05:50 +0000 (13:05 -0400)]
dm integrity: fix logic bug in integrity tag testing
If all the bytes are equal to DISCARD_FILLER, we want to accept the
buffer. If any of the bytes are different, we must do thorough
tag-by-tag checking.
The condition was inverted.
Fixes: 84597a44a9d8 ("dm integrity: add optional discard support")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Waiman Long [Mon, 30 Mar 2020 14:06:15 +0000 (10:06 -0400)]
docs: cgroup-v1: Document the cpuset_v2_mode mount option
The cpuset in cgroup v1 accepts a special "cpuset_v2_mode" mount
option that make cpuset.cpus and cpuset.mems behave more like those in
cgroup v2. Document it to make other people more aware of this feature
that can be useful in some circumstances.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Mike Snitzer [Thu, 2 Apr 2020 23:36:26 +0000 (19:36 -0400)]
Revert "dm: always call blk_queue_split() in dm_process_bio()"
This reverts commit
effd58c95f277744f75d6e08819ac859dbcbd351.
blk_queue_split() is causing excessive IO splitting -- because
blk_max_size_offset() depends on 'chunk_sectors' limit being set and
if it isn't (as is the case for DM targets!) it falls back to
splitting on a 'max_sectors' boundary regardless of offset.
"Fix" this by reverting back to _not_ using blk_queue_split() in
dm_process_bio() for normal IO (reads and writes). Long-term fix is
still TBD but it should focus on training blk_max_size_offset() to
call into a DM provided hook (to call DM's max_io_len()).
Test results from simple misaligned IO test on 4-way dm-striped device
with chunksize of 128K and stripesize of 512K:
xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev
before this revert:
253,0 21 1 0.
000000000 2206 Q R 224 + 4072 [xfs_io]
253,0 21 2 0.
000008267 2206 X R 224 / 480 [xfs_io]
253,0 21 3 0.
000010530 2206 X R 224 / 256 [xfs_io]
253,0 21 4 0.
000027022 2206 X R 480 / 736 [xfs_io]
253,0 21 5 0.
000028751 2206 X R 480 / 512 [xfs_io]
253,0 21 6 0.
000033323 2206 X R 736 / 992 [xfs_io]
253,0 21 7 0.
000035130 2206 X R 736 / 768 [xfs_io]
253,0 21 8 0.
000039146 2206 X R 992 / 1248 [xfs_io]
253,0 21 9 0.
000040734 2206 X R 992 / 1024 [xfs_io]
253,0 21 10 0.
000044694 2206 X R 1248 / 1504 [xfs_io]
253,0 21 11 0.
000046422 2206 X R 1248 / 1280 [xfs_io]
253,0 21 12 0.
000050376 2206 X R 1504 / 1760 [xfs_io]
253,0 21 13 0.
000051974 2206 X R 1504 / 1536 [xfs_io]
253,0 21 14 0.
000055881 2206 X R 1760 / 2016 [xfs_io]
253,0 21 15 0.
000057462 2206 X R 1760 / 1792 [xfs_io]
253,0 21 16 0.
000060999 2206 X R 2016 / 2272 [xfs_io]
253,0 21 17 0.
000062489 2206 X R 2016 / 2048 [xfs_io]
253,0 21 18 0.
000066133 2206 X R 2272 / 2528 [xfs_io]
253,0 21 19 0.
000067507 2206 X R 2272 / 2304 [xfs_io]
253,0 21 20 0.
000071136 2206 X R 2528 / 2784 [xfs_io]
253,0 21 21 0.
000072764 2206 X R 2528 / 2560 [xfs_io]
253,0 21 22 0.
000076185 2206 X R 2784 / 3040 [xfs_io]
253,0 21 23 0.
000077486 2206 X R 2784 / 2816 [xfs_io]
253,0 21 24 0.
000080885 2206 X R 3040 / 3296 [xfs_io]
253,0 21 25 0.
000082316 2206 X R 3040 / 3072 [xfs_io]
253,0 21 26 0.
000085788 2206 X R 3296 / 3552 [xfs_io]
253,0 21 27 0.
000087096 2206 X R 3296 / 3328 [xfs_io]
253,0 21 28 0.
000093469 2206 X R 3552 / 3808 [xfs_io]
253,0 21 29 0.
000095186 2206 X R 3552 / 3584 [xfs_io]
253,0 21 30 0.
000099228 2206 X R 3808 / 4064 [xfs_io]
253,0 21 31 0.
000101062 2206 X R 3808 / 3840 [xfs_io]
253,0 21 32 0.
000104956 2206 X R 4064 / 4096 [xfs_io]
253,0 21 33 0.
001138823 0 C R 4096 + 200 [0]
after this revert:
253,0 18 1 0.
000000000 4430 Q R 224 + 3896 [xfs_io]
253,0 18 2 0.
000018359 4430 X R 224 / 256 [xfs_io]
253,0 18 3 0.
000028898 4430 X R 256 / 512 [xfs_io]
253,0 18 4 0.
000033535 4430 X R 512 / 768 [xfs_io]
253,0 18 5 0.
000065684 4430 X R 768 / 1024 [xfs_io]
253,0 18 6 0.
000091695 4430 X R 1024 / 1280 [xfs_io]
253,0 18 7 0.
000098494 4430 X R 1280 / 1536 [xfs_io]
253,0 18 8 0.
000114069 4430 X R 1536 / 1792 [xfs_io]
253,0 18 9 0.
000129483 4430 X R 1792 / 2048 [xfs_io]
253,0 18 10 0.
000136759 4430 X R 2048 / 2304 [xfs_io]
253,0 18 11 0.
000152412 4430 X R 2304 / 2560 [xfs_io]
253,0 18 12 0.
000160758 4430 X R 2560 / 2816 [xfs_io]
253,0 18 13 0.
000183385 4430 X R 2816 / 3072 [xfs_io]
253,0 18 14 0.
000190797 4430 X R 3072 / 3328 [xfs_io]
253,0 18 15 0.
000197667 4430 X R 3328 / 3584 [xfs_io]
253,0 18 16 0.
000218751 4430 X R 3584 / 3840 [xfs_io]
253,0 18 17 0.
000226005 4430 X R 3840 / 4096 [xfs_io]
253,0 18 18 0.
000250404 4430 Q R 4120 + 176 [xfs_io]
253,0 18 19 0.
000847708 0 C R 4096 + 24 [0]
253,0 18 20 0.
000855783 0 C R 4120 + 176 [0]
Fixes: effd58c95f27774 ("dm: always call blk_queue_split() in dm_process_bio()")
Cc: stable@vger.kernel.org
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Tejun Heo [Fri, 3 Apr 2020 15:32:13 +0000 (11:32 -0400)]
Revert "cpuset: Make cpuset hotplug synchronous"
This reverts commit
a49e4629b5ed ("cpuset: Make cpuset hotplug synchronous") as
it may deadlock with cpu hotplug path.
Link: http://lkml.kernel.org/r/F0388D99-84D7-453B-9B6B-EEFF0E7BE4CC@lca.pw
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Prateek Sood <prsood@codeaurora.org>
Mike Snitzer [Fri, 3 Apr 2020 01:11:24 +0000 (21:11 -0400)]
dm integrity: fix ppc64le warning
Otherwise:
In file included from drivers/md/dm-integrity.c:13:
drivers/md/dm-integrity.c: In function 'dm_integrity_status':
drivers/md/dm-integrity.c:3061:10: error: format '%llu' expects
argument of type 'long long unsigned int', but argument 4 has type
'long int' [-Werror=format=]
DMEMIT("%llu %llu",
^~~~~~~~~~~
atomic64_read(&ic->number_of_mismatches),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/device-mapper.h:550:46: note: in definition of macro 'DMEMIT'
0 : scnprintf(result + sz, maxlen - sz, x))
^
cc1: all warnings being treated as errors
Fixes: 7649194a1636ab5 ("dm integrity: remove sector type casts")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Fri, 3 Apr 2020 00:32:52 +0000 (17:32 -0700)]
Merge tag 'devicetree-for-5.7' of git://git./linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
- Unit test for overlays with GPIO hogs
- Improve dma-ranges parsing to handle dma-ranges with multiple entries
- Update dtc to upstream version
v1.6.0-2-g87a656ae5ff9
- Improve overlay error reporting
- Device link support for power-domains and hwlocks bindings
- Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
ITE Tech
- Add deprecated Marvell vendor prefix 'mrvl'
- A bunch of binding conversions to DT schema continues. Of note, the
common serial and USB connector bindings are converted.
- Add more Arm CPU compatibles
- Drop Mark Rutland as DT maintainer :(
* tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
MAINTAINERS: drop an old reference to stm32 pwm timers doc
MAINTAINERS: dt: update etnaviv file reference
dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
dt-bindings: uniphier-system-bus: fix warning in the example
dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
dt-bindings: power: Fix example for power-domain
dt-bindings: arm: Add some constraints for PSCI nodes
of: some unittest overlays not untracked
of: gpio unittest kfree() wrong object
dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
dt-bindings: serial: sh-sci: Convert to json-schema
dt-bindings: serial: Document serialN aliases
dt-bindings: thermal: tsens: Set 'additionalProperties: false'
dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
dt-bindings: vendor-prefixes: Add Beacon vendor prefix
dt-bindings: vendor-prefixes: Add Topwise
of: of_private.h: Replace zero-length array with flexible-array member
docs: dt: fix a broken reference to input.yaml
docs: dt: fix references to ap806-system-controller.txt
...
Linus Torvalds [Fri, 3 Apr 2020 00:03:53 +0000 (17:03 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series has a huge amount of churn because it pulls in Mauro's doc
update changing all our txt files to rst ones.
Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
some other minor updates.
The major core change is Hannes moving functions out of the aacraid
driver and into the core"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
scsi: ufs: Do not rely on prefetched data
scsi: dc395x: remove dc395x_bios_param
scsi: libiscsi: Fix error count for active session
scsi: hpsa: correct race condition in offload enabled
scsi: message: fusion: Replace zero-length array with flexible-array member
scsi: qedi: Add PCI shutdown handler support
scsi: qedi: Add MFW error recovery process
scsi: ufs: Enable block layer runtime PM for well-known logical units
scsi: ufs-qcom: Override devfreq parameters
scsi: ufshcd: Let vendor override devfreq parameters
scsi: ufshcd: Update the set frequency to devfreq
scsi: ufs: Resume ufs host before accessing ufs device
scsi: ufs-mediatek: customize the delay for enabling host
scsi: ufs: make HCE polling more compact to improve initialization latency
scsi: ufs: allow custom delay prior to host enabling
scsi: ufs-mediatek: use common delay function
scsi: ufs: introduce common and flexible delay function
scsi: ufs: use an enum for host capabilities
scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
...
Linus Torvalds [Thu, 2 Apr 2020 23:45:46 +0000 (16:45 -0700)]
Merge tag 'mtd/for-5.7' of git://git./linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"MTD core changes:
- Fix issue where write_cached_data() fails but write() still returns
success
- maps: sa1100-flash: Replace zero-length array with flexible-array
member
- phram: Fix a double free issue in error path
- Convert fallthrough comments into statements
- MAINTAINERS: Add the IRC channel to the MTD related subsystems
Raw NAND core changes:
- Add support for manufacturer specific suspend/resume operation
- Add support for manufacturer specific lock/unlock operation
- Replace zero-length array with flexible-array member
- Fix a typo ("manufecturer")
- Ensure nand_soft_waitrdy wait period is enough
Raw NAND controller driver changes:
- Brcmnand:
* Add support for flash-edu for dma transfers (+ bindings)
- Cadence:
* Reinit completion before executing a new command
* Change bad block marker size
* Fix the calculation of the avaialble OOB size
* Get meta data size from registers
- Qualcom:
* Use dma_request_chan() instead dma_request_slave_channel()
* Release resources on failure within qcom_nandc_alloc()
- Allwinner:
* Use dma_request_chan() instead dma_request_slave_channel()
- Marvell:
* Use dma_request_chan() instead dma_request_slave_channel()
* Release DMA channel on error
- Freescale:
* Use dma_request_chan() instead dma_request_slave_channel()
- Macronix:
* Add support for Macronix NAND randomizer (+ bindings)
- Ams-delta:
* Rename structures and functions to gpio_nand*
* Make the driver custom I/O ready
* Drop useless local variable
* Support custom driver initialisation
* Add module device tables
* Handle more GPIO pins as optional
* Make read pulses optional
* Don't hardcode read/write pulse widths
* Push inversion handling to gpiolib
* Enable OF partition info support
* Drop board specific partition info
* Use struct gpio_nand_platdata
* Write protect device during probe
- Ingenic:
* Use devm_platform_ioremap_resource()
* Add dependency on MIPS || COMPILE_TEST
- Denali:
* Deassert write protect pin
- ST:
* Use dma_request_chan() instead dma_request_slave_channel()
Raw NAND chip driver changes:
- Toshiba:
* Support reading the number of bitflips for BENAND (Built-in ECC NAND)
- Macronix:
* Add support for deep power down mode
* Add support for block protection
SPI-NAND core changes:
- Do not erase the block before writing a bad block marker
- Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
- Stop using spinand->oobbuf for buffering bad block markers
- Rework detect procedure for different READ_ID operation
SPI-NAND driver changes:
- Toshiba:
* Support for new Kioxia Serial NAND
* Rename function name to change suffix and prefix (8Gbit)
* Add comment about Kioxia ID
- Micron:
* Add new Micron SPI NAND devices with multiple dies
* Add M70A series Micron SPI NAND devices
* identify SPI NAND device with Continuous Read mode
* Add new Micron SPI NAND devices
* Describe the SPI NAND device MT29F2G01ABAGD
* Generalize the OOB layout structure and function names
SPI NOR core changes:
- Move all the manufacturer specific quirks/code out of the core, to
make the core logic more readable and thus ease maintenance.
- Move the SFDP logic out of the core, it provides a better
separation between the SFDP parsing and core logic.
- Trim what is exposed in spi-nor.h. The SPI NOR controllers drivers
must not be able to use structures that are meant just for the SPI
NOR core.
- Use the spi-mem direct mapping API to let advanced controllers
optimize the read/write operations when they support direct
mapping.
- Add generic formula for the Status Register block protection
handling. It fixes some long standing locking limitations and eases
the addition of the 4bit block protection support.
- Add block protection support for flashes with 4 block protection
bits in the Status Register.
SPI NOR controller drivers changes:
- The mtk-quadspi driver is replaced by the new spi-mem spi-mtk-nor
driver.
- Merge tag 'mtk-mtd-spi-move' into spi-nor/next to avoid conflicts.
HyperBus changes:
- Print error msg when compatible is wrong or missing
- Move mapping of direct access window from core to individual
drivers"
* tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (103 commits)
mtd: Convert fallthrough comments into statements
mtd: rawnand: toshiba: Support reading the number of bitflips for BENAND (Built-in ECC NAND)
MAINTAINERS: Add the IRC channel to the MTD related subsystems
mtd: Fix issue where write_cached_data() fails but write() still returns success
mtd: maps: sa1100-flash: Replace zero-length array with flexible-array member
mtd: phram: fix a double free issue in error path
mtd: spinand: toshiba: Support for new Kioxia Serial NAND
mtd: spinand: toshiba: Rename function name to change suffix and prefix (8Gbit)
mtd: rawnand: macronix: Add support for deep power down mode
mtd: rawnand: Add support for manufacturer specific suspend/resume operation
mtd: spi-nor: Enable locking for n25q512ax3/n25q512a
mtd: spi-nor: Add SR 4bit block protection support
mtd: spi-nor: Add generic formula for SR block protection handling
mtd: spi-nor: Set all BP bits to one when lock_len == mtd->size
mtd: spi-nor: controllers: aspeed-smc: Replace zero-length array with flexible-array member
mtd: spi-nor: Clear WEL bit when erase or program errors occur
MAINTAINERS: update entry after SPI NOR controller move
mtd: spi-nor: Trim what is exposed in spi-nor.h
mtd: spi-nor: Drop the MFR definitions
mtd: spi-nor: Get rid of the now empty spi_nor_ids[] table
...
Linus Torvalds [Thu, 2 Apr 2020 23:04:42 +0000 (16:04 -0700)]
Merge tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"Core:
- Some code cleanup and optimization in core by Andy
- Debugfs support for displaying dmaengine channels by Peter
Drivers:
- New driver for uniphier-xdmac controller
- Updates to stm32 dma, mdma and dmamux drivers and PM support
- More updates to idxd drivers
- Bunch of changes in tegra-apb driver and cleaning up of pm
functions
- Bunch of spelling fixes and Replace zero-length array patches
- Shutdown hook for fsl-dpaa2-qdma driver
- Support for interleaved transfers for ti-edma and virtualization
support for k3-dma driver
- Support for reset and updates in xilinx_dma driver
- Improvements and locking updates in at_hdma driver"
* tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (89 commits)
dt-bindings: dma: renesas,usb-dmac: add r8a77961 support
dmaengine: uniphier-xdmac: Remove redandant error log for platform_get_irq
dmaengine: tegra-apb: Improve DMA synchronization
dmaengine: tegra-apb: Don't save/restore IRQ flags in interrupt handler
dmaengine: tegra-apb: mark PM functions as __maybe_unused
dmaengine: fix spelling mistake "exceds" -> "exceeds"
dmaengine: sprd: Set request pending flag when DMA controller is active
dmaengine: ppc4xx: Use scnprintf() for avoiding potential buffer overflow
dmaengine: idxd: remove global token limit check
dmaengine: idxd: reflect shadow copy of traffic class programming
dmaengine: idxd: Merge definition of dsa_batch_desc into dsa_hw_desc
dmaengine: Create debug directories for DMA devices
dmaengine: ti: k3-udma: Implement custom dbg_summary_show for debugfs
dmaengine: Add basic debugfs support
dmaengine: fsl-dpaa2-qdma: remove set but not used variable 'dpaa2_qdma'
dmaengine: ti: edma: fix null dereference because of a typo in pointer name
dmaengine: fsl-dpaa2-qdma: Adding shutdown hook
dmaengine: uniphier-xdmac: Add UniPhier external DMA controller driver
dt-bindings: dmaengine: Add UniPhier external DMA controller bindings
dmaengine: ti: k3-udma: Implement support for atype (for virtualization)
...
Linus Torvalds [Thu, 2 Apr 2020 22:54:13 +0000 (15:54 -0700)]
Merge branch 'i2c/for-5.7' of git://git./linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"I2C has:
- using defines for bus speeds to avoid mistakes in hardcoded values;
lots of small driver updates because of that. Thanks, Andy!
- API change: i2c_setup_smbus_alert() was renamed to
i2c_new_smbus_alert_device() and returns ERRPTR now. All in-tree
users have been converted
- in the core, a rare race condition when deleting the cdev has been
fixed. Thanks, Kevin!
- lots of driver updates. Thanks, everyone!
I also want to mention: The amount of review and testing tags given
was quite high this time. Thank you to these people, too. I hope we
can keep it like this!"
* 'i2c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (34 commits)
i2c: rcar: clean up after refactoring i2c_timings
macintosh: convert to i2c_new_scanned_device
i2c: drivers: Use generic definitions for bus frequencies
i2c: algo: Use generic definitions for bus frequencies
i2c: stm32f7: switch to I²C generic property parsing
i2c: rcar: Consolidate timings calls in rcar_i2c_clock_calculate()
i2c: core: Allow override timing properties with 0
i2c: core: Provide generic definitions for bus frequencies
i2c: mxs: Use dma_request_chan() instead dma_request_slave_channel()
i2c: imx: remove duplicate print after platform_get_irq()
i2c: designware: Fix spelling typos in the comments
i2c: designware: Discard i2c_dw_read_comp_param() function
i2c: designware: Detect the FIFO size in the common code
i2c: dev: Fix the race between the release of i2c_dev and cdev
i2c: qcom-geni: Drop of_platform.h include
i2c: qcom-geni: Grow a dev pointer to simplify code
i2c: qcom-geni: Let firmware specify irq trigger flags
i2c: stm32f7: do not backup read-only PECR register
i2c: smbus: remove outdated references to irq level triggers
i2c: convert SMBus alert setup function to return an ERRPTR
...
Linus Torvalds [Thu, 2 Apr 2020 22:50:04 +0000 (15:50 -0700)]
Merge tag 'sound-5.7-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This became again a busy development cycle. There are few ALSA core
updates (merely API cleanups and sparse fixes), with the majority of
other changes are found in ASoC scene.
Here are some highlights:
ALSA core:
- More helper macros for sparse warning fixes (e.g. bitwise types)
- Slight optimization of PCM OSS locks
- Make common handling for PCM / compress buffers (for SOF)
ASoC:
- Lots of code refactoring and modernization for (still ongoing)
componentization works
- Conversion of SND_SOC_ALL_CODECS to use imply
- Continued refactoring and fixing of the Intel SOF/SST support,
including the initial (but still incomplete) SoundWire support
- SoundWire and more advanced clocking support for Realtek RT5682
- Support for amlogic GX, Meson 8, Meson 8B and T9015 DAC, Broadcom
DSL/PON, Ingenic JZ4760 and JZ4770, Realtek RL6231, and TI TAS2563
and TLV320ADCX140
HD-audio:
- Optimizations in HDMI jack handling
- A few new quirks and fixups for Realtek codecs
USB-audio:
- Delayed registration support
- New quirks for Motu, Kingston, Presonus"
* tag 'sound-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (415 commits)
ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
Revert "ALSA: uapi: Drop asound.h inclusion from asoc.h"
ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
ALSA: hda/realtek - a fake key event is triggered by running shutup
ALSA: hda: default enable CA0132 DSP support
ASoC: amd: acp3x-pcm-dma: clean up two indentation issues
ASoC: tlv320adcx140: Remove undocumented property
ASoC: Intel: sof_sdw: Add Volteer support with RT5682 SNDW helper function
ASoC: Intel: common: add match table for TGL RT5682 SoundWire driver
ASoC: Intel: boards: add sof_sdw machine driver
ASoC: Intel: soc-acpi: update topology and driver name for SoundWire platforms
ASoC: rt5682: move DAI clock registry to I2S mode
ASoC: pxa: magician: convert to use i2c_new_client_device()
ASoC: SOF: Intel: hda-ctrl: add reset cycle before parsing capabilities
Asoc: SOF: Intel: hda: check SoundWire wakeen interrupt in irq thread
ASoC: SOF: Intel: hda: add WAKEEN interrupt support for SoundWire
ASoC: SOF: Intel: hda: add parameter to control SoundWire clock stop quirks
ASoC: SOF: Intel: hda: merge IPC, stream and SoundWire interrupt handlers
...
Linus Torvalds [Thu, 2 Apr 2020 22:47:18 +0000 (15:47 -0700)]
Merge tag 'pinctrl-v5.7-1' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"This is the bulk of pin control changes for the v5.7 kernel cycle.
There are no core changes this time, only driver developments:
- New driver for the Dialog Semiconductor DA9062 Power Management
Integrated Circuit (PMIC).
- Renesas SH-PFC has improved consistency, with group and register
checks in the configuration checker.
- New subdriver for the Qualcomm IPQ6018.
- Add the RGMII pin control functionality to Qualcomm IPQ8064.
- Performance and code quality cleanups in the Mediatek driver.
- Improve the Broadcom BCM2835 support to cover all the GPIOs that
exist in it.
- The Allwinner/Sunxi driver properly masks non-wakeup IRQs on
suspend.
- Add some missing groups and functions to the Ingenic driver.
- Convert some of the Freescale device tree bindings to use the new
and all improved JSON YAML markup.
- Refactorings and support for the SFIO/GPIO in the Tegra194 SoC
driver.
- Support high impedance mode in the Spreadtrum/Unisoc driver"
* tag 'pinctrl-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (64 commits)
pinctrl: qcom: fix compilation error
pinctrl: qcom: use scm_call to route GPIO irq to Apps
pinctrl: sprd: Add pin high impedance mode support
pinctrl: sprd: Use the correct pin output configuration
pinctrl: tegra: Add SFIO/GPIO programming on Tegra194
pinctrl: tegra: Renumber the GG.0 and GG.1 pins
pinctrl: tegra: Do not add default pin range on Tegra194
pinctrl: tegra: Pass struct tegra_pmx for pin range check
pinctrl: tegra: Fix "Scmitt" -> "Schmitt" typo
pinctrl: tegra: Fix whitespace issues for improved readability
pinctrl: mediatek: Use scnprintf() for avoiding potential buffer overflow
pinctrl: freescale: drop the dependency on ARM64 for i.MX8M
Revert "pinctrl: mvebu: armada-37xx: use use platform api"
dt-bindings: pinctrl: at91: Fix a typo ("descibe")
pinctrl: meson: add tsin pinctrl for meson gxbb/gxl/gxm
pinctrl: sprd: Fix the kconfig warning
pinctrl: ingenic: add hdmi-ddc pin control group
pinctrl: sirf/atlas7: Replace zero-length array with flexible-array member
pinctrl: sprd: Allow the SPRD pinctrl driver building into a module
pinctrl: Export some needed symbols at module load time
...
Linus Torvalds [Thu, 2 Apr 2020 22:45:45 +0000 (15:45 -0700)]
Merge tag 'hwlock-v5.7' of git://git./linux/kernel/git/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"This marks all hwspinlock driver COMPILE_TESTable and replaces the
zero-length array in hwspinlock_device with a flexible-array member"
* tag 'hwlock-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
hwspinlock: hwspinlock_internal.h: Replace zero-length array with flexible-array member
hwspinlock: Allow drivers to be built with COMPILE_TEST
Linus Torvalds [Thu, 2 Apr 2020 22:42:13 +0000 (15:42 -0700)]
Merge tag 'rproc-v5.7' of git://git./linux/kernel/git/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
- a range of improvements to the OMAP remoeteproc driver; among other
things adding devicetree, suspend/resume and watchdog support, and
adds support the remoteprocs in the DRA7xx SoC
- support for 64-bit firmware, extends the ELF loader to support this
and fixes for a number of race conditions in the recovery handling
- a generic mechanism to allow remoteproc drivers to sync state with
remote processors during a panic, and uses this to prepare Qualcomm
remote processors for post mortem analysis
- fixes to cleanly recover from crashes in the modem firmware on
production Qualcomm devices
* tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (37 commits)
remoteproc/omap: Switch to SPDX license identifiers
remoteproc/omap: Add watchdog functionality for remote processors
remoteproc/omap: Report device exceptions and trigger recovery
remoteproc/omap: Add support for runtime auto-suspend/resume
remoteproc/omap: Add support for system suspend/resume
remoteproc/omap: Request a timer(s) for remoteproc usage
remoteproc/omap: Check for undefined mailbox messages
remoteproc/omap: Remove the platform_data header
remoteproc/omap: Add support for DRA7xx remote processors
remoteproc/omap: Initialize and assign reserved memory node
remoteproc/omap: Add the rproc ops .da_to_va() implementation
remoteproc/omap: Add support to parse internal memories from DT
remoteproc/omap: Add a sanity check for DSP boot address alignment
remoteproc/omap: Add device tree support
dt-bindings: remoteproc: Add OMAP remoteproc bindings
remoteproc: qcom: Introduce panic handler for PAS and ADSP
remoteproc: qcom: q6v5: Add common panic handler
remoteproc: Introduce "panic" callback in ops
remoteproc: Traverse rproc_list under RCU read lock
remoteproc: Fix NULL pointer dereference in rproc_virtio_notify
...
Linus Torvalds [Thu, 2 Apr 2020 22:21:48 +0000 (15:21 -0700)]
Merge branch 'for-5.7' of git://git./linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:
"This is just a few documentation fixes for percpu refcount and bitmap
helpers that went in v5.6, and moving my emails to all be at korg"
* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: update copyright emails to dennis@kernel.org
include/bitmap.h: add new functions to documentation
include/bitmap.h: add missing parameter in docs
percpu_ref: Fix comment regarding percpu_ref_init flags
Linus Torvalds [Thu, 2 Apr 2020 22:13:15 +0000 (15:13 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- GICv4.1 support
- 32bit host removal
PPC:
- secure (encrypted) using under the Protected Execution Framework
ultravisor
s390:
- allow disabling GISA (hardware interrupt injection) and protected
VMs/ultravisor support.
x86:
- New dirty bitmap flag that sets all bits in the bitmap when dirty
page logging is enabled; this is faster because it doesn't require
bulk modification of the page tables.
- Initial work on making nested SVM event injection more similar to
VMX, and less buggy.
- Various cleanups to MMU code (though the big ones and related
optimizations were delayed to 5.8). Instead of using cr3 in
function names which occasionally means eptp, KVM too has
standardized on "pgd".
- A large refactoring of CPUID features, which now use an array that
parallels the core x86_features.
- Some removal of pointer chasing from kvm_x86_ops, which will also
be switched to static calls as soon as they are available.
- New Tigerlake CPUID features.
- More bugfixes, optimizations and cleanups.
Generic:
- selftests: cleanups, new MMU notifier stress test, steal-time test
- CSV output for kvm_stat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
x86/kvm: fix a missing-prototypes "vmread_error"
KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
KVM: VMX: Add a trampoline to fix VMREAD error handling
KVM: SVM: Annotate svm_x86_ops as __initdata
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: Pass kvm_init()'s opaque param to additional arch funcs
s390/gmap: return proper error code on ksm unsharing
KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
KVM: Fix out of range accesses to memslots
KVM: X86: Micro-optimize IPI fastpath delay
KVM: X86: Delay read msr data iff writes ICR MSR
KVM: PPC: Book3S HV: Add a capability for enabling secure guests
KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
...
Linus Torvalds [Thu, 2 Apr 2020 21:52:12 +0000 (14:52 -0700)]
Merge tag 'x86-urgent-2020-04-02' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"A single fix addressing Sparse warnings. <asm/bitops.h> is changed
non-trivially to avoid the warnings, but generated code is not
supposed to be affected"
* tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix bitops.h warning with a moved cast
Linus Torvalds [Thu, 2 Apr 2020 21:49:46 +0000 (14:49 -0700)]
Merge branch 'next-integrity' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Just a couple of updates for linux-5.7:
- A new Kconfig option to enable IMA architecture specific runtime
policy rules needed for secure and/or trusted boot, as requested.
- Some message cleanup (eg. pr_fmt, additional error messages)"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a new CONFIG for loading arch-specific policies
integrity: Remove duplicate pr_fmt definitions
IMA: Add log statements for failure conditions
IMA: Update KBUILD_MODNAME for IMA files to ima
Linus Torvalds [Thu, 2 Apr 2020 20:55:34 +0000 (13:55 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
"A large amount of MM, plenty more to come.
Subsystems affected by this patch series:
- tools
- kthread
- kbuild
- scripts
- ocfs2
- vfs
- mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
hugetlbfs, hugetlb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
selftests/vm: fix map_hugetlb length used for testing read and write
mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
mm/hugetlb.c: clean code by removing unnecessary initialization
hugetlb_cgroup: add hugetlb_cgroup reservation docs
hugetlb_cgroup: add hugetlb_cgroup reservation tests
hugetlb: support file_region coalescing again
hugetlb_cgroup: support noreserve mappings
hugetlb_cgroup: add accounting for shared mappings
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add reservation accounting for private mappings
mm/hugetlb_cgroup: fix hugetlb_cgroup migration
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
hugetlb_cgroup: add hugetlb_cgroup reservation counter
hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
mm/memblock.c: remove redundant assignment to variable max_addr
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
...
Linus Torvalds [Thu, 2 Apr 2020 20:02:07 +0000 (13:02 -0700)]
Merge tag 'xfs-5.7-merge-8' of git://git./fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"There's a lot going on this cycle with cleanups in the log code, the
btree code, and the xattr code.
We're tightening of metadata validation and online fsck checking, and
introducing a common btree rebuilding library so that we can refactor
xfs_repair and introduce online repair in a future cycle.
We also fixed a few visible bugs -- most notably there's one in
getdents that we introduced in 5.6; and a fix for hangs when disabling
quotas.
This series has been running fstests & other QA in the background for
over a week and looks good so far.
I anticipate sending a second pull request next week. That batch will
change how xfs interacts with memory reclaim; how the log batches and
throttles log items; how hard writes near ENOSPC will try to squeeze
more space out of the filesystem; and hopefully fix the last of the
umount hangs after a catastrophic failure. That should ease a lot of
problems when running at the limits, but for now I'm leaving that in
for-next for another week to make sure we got all the subtleties
right.
Summary:
- Fix a hard to trigger race between iclog error checking and log
shutdown.
- Strengthen the AGF verifier.
- Ratelimit some of the more spammy error messages.
- Remove the icdinode uid/gid members and just use the ones in the
vfs inode.
- Hold ILOCK across insert/collapse range.
- Clean up the extended attribute interfaces.
- Clean up the attr flags mess.
- Restore PF_MEMALLOC after exiting xfsaild thread to avoid
triggering warnings in the process accounting code.
- Remove the flexibly-sized array from struct xfs_agfl to eliminate
compiler warnings about unaligned pointers and packed structures.
- Various macro and typedef removals.
- Stale metadata buffers if we decide they're corrupt outside of a
verifier.
- Check directory data/block/free block owners.
- Fix a UAF when aborting inactivation of a corrupt xattr fork.
- Teach online scrub to report failed directory and attr name lookups
as a metadata corruption instead of a runtime error.
- Avoid potential buffer overflows in sysfs files by using scnprintf.
- Fix a regression in getdents lookups due to a mistake in pointer
arithmetic.
- Refactor btree cursor private data structures to use anonymous
unions.
- Cleanups in the log unmounting code.
- Fix a potential mishandling of ENOMEM errors on multi-block
directory buffer lookups.
- Fix an incorrect test in the block allocation code.
- Cleanups and name prefix shortening in the scrub code.
- Introduce btree bulk loading code for online repair and scrub.
- Fix a quotaoff log item leak (and hang) when the fs goes down
midway through a quotaoff operation.
- Remove di_version from the incore inode.
- Refactor some of the log shutdown checking code.
- Record the forcing of the log unmount records in the log force
counters.
- Fix a longstanding bug where quotacheck would purge the
administrator's default quota grace interval and warning limits.
- Reduce memory usage when scrubbing directory and xattr trees.
- Don't let fsfreeze race with GETFSMAP or online scrub.
- Handle bio_add_page failures more gracefully in xlog_write_iclog"
* tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (108 commits)
xfs: prohibit fs freezing when using empty transactions
xfs: shutdown on failure to add page to log bio
xfs: directory bestfree check should release buffers
xfs: drop all altpath buffers at the end of the sibling check
xfs: preserve default grace interval during quotacheck
xfs: remove xlog_state_want_sync
xfs: move the ioerror check out of xlog_state_clean_iclog
xfs: refactor xlog_state_clean_iclog
xfs: remove the aborted parameter to xlog_state_done_syncing
xfs: simplify log shutdown checking in xfs_log_release_iclog
xfs: simplify the xfs_log_release_iclog calling convention
xfs: factor out a xlog_wait_on_iclog helper
xfs: merge xlog_cil_push into xlog_cil_push_work
xfs: remove the di_version field from struct icdinode
xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
xfs: simplify di_flags2 inheritance in xfs_ialloc
xfs: only check the superblock version for dinode size calculation
xfs: add a new xfs_sb_version_has_v3inode helper
xfs: fix unmount hang and memory leak on shutdown during quotaoff
xfs: factor out quotaoff intent AIL removal and memory free
...
Linus Torvalds [Thu, 2 Apr 2020 19:59:36 +0000 (12:59 -0700)]
Merge tag 'vfs-5.7-merge-1' of git://git./fs/xfs/xfs-linux
Pull hibernation fix from Darrick Wong:
"Fix a regression where we broke the userspace hibernation driver by
disallowing writes to the swap device"
* tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
hibernate: Allow uswsusp to write to swap
Linus Torvalds [Thu, 2 Apr 2020 19:57:18 +0000 (12:57 -0700)]
Merge tag 'iomap-5.7-merge-2' of git://git./fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"We're fixing tracepoints and comments in this cycle, so there
shouldn't be any surprises here.
I anticipate sending a second pull request next week with a single bug
fix for readahead, but it's still undergoing QA.
Summary:
- Fix a broken tracepoint
- Fix a broken comment"
* tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix comments in iomap_dio_rw
iomap: Remove pgoff from tracepoints
Linus Torvalds [Thu, 2 Apr 2020 19:30:08 +0000 (12:30 -0700)]
Merge branch 'work.dotdot1' of git://git./linux/kernel/git/viro/vfs
Pull vfs pathwalk sanitizing from Al Viro:
"Massive pathwalk rewrite and cleanups.
Several iterations have been posted; hopefully this thing is getting
readable and understandable now. Pretty much all parts of pathname
resolutions are affected...
The branch is identical to what has sat in -next, except for commit
message in "lift all calls of step_into() out of follow_dotdot/
follow_dotdot_rcu", crediting Qian Cai for reporting the bug; only
commit message changed there."
* 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (69 commits)
lookup_open(): don't bother with fallbacks to lookup+create
atomic_open(): no need to pass struct open_flags anymore
open_last_lookups(): move complete_walk() into do_open()
open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
open_last_lookups(): consolidate fsnotify_create() calls
take post-lookup part of do_last() out of loop
link_path_walk(): sample parent's i_uid and i_mode for the last component
__nd_alloc_stack(): make it return bool
reserve_stack(): switch to __nd_alloc_stack()
pick_link(): take reserving space on stack into a new helper
pick_link(): more straightforward handling of allocation failures
fold path_to_nameidata() into its only remaining caller
pick_link(): pass it struct path already with normal refcounting rules
fs/namei.c: kill follow_mount()
non-RCU analogue of the previous commit
helper for mount rootwards traversal
follow_dotdot(): be lazy about changing nd->path
follow_dotdot_rcu(): be lazy about changing nd->path
follow_dotdot{,_rcu}(): massage loops
...
Bjorn Helgaas [Thu, 2 Apr 2020 19:27:09 +0000 (14:27 -0500)]
Merge branch 'remotes/lorenzo/pci/vmd'
- Add new VMD device IDs (Sushma Kalakota)
* remotes/lorenzo/pci/vmd:
PCI: vmd: Add two VMD Device IDs
Bjorn Helgaas [Thu, 2 Apr 2020 19:27:05 +0000 (14:27 -0500)]
Merge branch 'remotes/lorenzo/pci/tegra'
- Convert tegra to use shared DT "ranges" parsing (Rob Herring)
* remotes/lorenzo/pci/tegra:
PCI: tegra: Use pci_parse_request_of_pci_ranges()
Bjorn Helgaas [Thu, 2 Apr 2020 19:27:02 +0000 (14:27 -0500)]
Merge branch 'remotes/lorenzo/pci/qcom'
- Apply Qualcomm class fixup only to PCIe host bridges, not to all
PCI_VENDOR_ID_QCOM devices (Bjorn Andersson)
* remotes/lorenzo/pci/qcom:
PCI: qcom: Fix the fixup of PCI_VENDOR_ID_QCOM
Bjorn Helgaas [Thu, 2 Apr 2020 19:27:01 +0000 (14:27 -0500)]
Merge branch 'remotes/lorenzo/pci/mobiveil'
- Restructure mobiveil driver to support either Root Complex mode or
Endpoint mode (Hou Zhiqiang)
- Collect host initialization into one place (Hou Zhiqiang)
- Collect interrupt-related code into one place (Hou Zhiqiang)
- Split mobiveil into separate files under
drivers/pci/controller/mobiveil for easier reuse (Hou Zhiqiang)
- Add callbacks for interrupt initialization and linkup checking (Hou
Zhiqiang)
- Add 8- and 16-bit CSR accessors (Hou Zhiqiang)
- Initialize host driver only if Header Type is "bridge" (Hou Zhiqiang)
- Add DT bindings for NXP Layerscape SoCs PCIe Gen4 controller (Hou
Zhiqiang)
- Add PCIe Gen4 RC driver for Layerscape SoCs (Hou Zhiqiang)
- Add pcie-mobiveil __iomem annotations (Hou Zhiqiang)
- Add PCI_MSI_IRQ_DOMAIN Kconfig dependency (Hou Zhiqiang)
* remotes/lorenzo/pci/mobiveil:
PCI: mobiveil: Fix unmet dependency warning for PCIE_MOBIVEIL_PLAT
PCI: mobiveil: Fix sparse different address space warnings
PCI: mobiveil: Add PCIe Gen4 RC driver for Layerscape SoCs
dt-bindings: PCI: Add NXP Layerscape SoCs PCIe Gen4 controller
PCI: mobiveil: Add Header Type field check
PCI: mobiveil: Add 8-bit and 16-bit CSR register accessors
PCI: mobiveil: Allow mobiveil_host_init() to be used to re-init host
PCI: mobiveil: Add callback function for link up check
PCI: mobiveil: Add callback function for interrupt initialization
PCI: mobiveil: Modularize the Mobiveil PCIe Host Bridge IP driver
PCI: mobiveil: Collect the interrupt related operations into a function
PCI: mobiveil: Move the host initialization into a function
PCI: mobiveil: Introduce a new structure mobiveil_root_port
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:59 +0000 (14:26 -0500)]
Merge branch 'remotes/lorenzo/pci/hv'
- Fix memory leak in hv probe path (Dexuan Cui)
- Add support for Hyper-V protocol 1.3 (Long Li)
- Replace zero-length array with flexible-array member (Gustavo A. R.
Silva)
- Move hypercall definitions to <asm/hyperv-tlfs.h> (Boqun Feng)
- Move retarget definitions to <asm/hyperv-tlfs.h> and make them packed
(Boqun Feng)
- Add struct hv_msi_entry and hv_set_msi_entry_from_desc() to prepare for
future virtual PCI on non-x86 (Boqun Feng)
* remotes/lorenzo/pci/hv:
PCI: hv: Introduce hv_msi_entry
PCI: hv: Move retarget related structures into tlfs header
PCI: hv: Move hypercall related definitions into tlfs header
PCI: hv: Replace zero-length array with flexible-array member
PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2
PCI: hv: Decouple the func definition in hv_dr_state from VSP message
PCI: hv: Add missing kfree(hbus) in hv_pci_probe()'s error handling path
PCI: hv: Remove unnecessary type casting from kzalloc
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:57 +0000 (14:26 -0500)]
Merge branch 'remotes/lorenzo/pci/endpoint'
- Use notification chain instead of EPF linkup ops for EPC events (Kishon
Vijay Abraham I)
- Protect concurrent allocation in endpoint outbound address region
(Kishon Vijay Abraham I)
- Protect concurrent access to pci_epf_ops (Kishon Vijay Abraham I)
- Assign function number for each PF in endpoint core (Kishon Vijay
Abraham I)
- Refactor endpoint mode core initialization (Vidya Sagar)
- Add API to notify when core initialization completes (Vidya Sagar)
- Add test framework support to defer core initialization (Vidya Sagar)
- Update Tegra SoC ABI header to support uninitialization of UPHY PLL
when in endpoint mode without reference clock (Vidya Sagar)
- Add DT and driver support for Tegra194 PCIe endpoint nodes (Vidya
Sagar)
- Add endpoint test support for DMA data transfer (Kishon Vijay
Abraham I)
- Print throughput information in endpoint test (Kishon Vijay Abraham I)
- Use streaming DMA APIs for endpoint test buffer allocation (Kishon
Vijay Abraham I)
- Add endpoint test command line option for DMA (Kishon Vijay Abraham I)
- When stopping a controller via configfs, clear endpoint "start" entry
to prevent WARN_ON (Kunihiko Hayashi)
- Update endpoint ->set_msix() to pay attention to MSI-X BAR Indicator
and offset when finding MSI-X tables (Kishon Vijay Abraham I)
- MSI-X tables are in local memory, not in the PCI address space. Update
pcie-designware-ep to account for this (Kishon Vijay Abraham I)
- Allow AM654 PCIe Endpoint to raise MSI-X interrupts (Kishon Vijay
Abraham I)
- Avoid using module parameter to determine irqtype for endpoint test
(Kishon Vijay Abraham I)
- Add ioctl to clear IRQ for endpoint test (Kishon Vijay Abraham I)
- Add endpoint test 'e' option to clear IRQ (Kishon Vijay Abraham I)
- Bump limit on number of endpoint test devices from 10 to 10,000 (Kishon
Vijay Abraham I)
- Use full pci-endpoint-test name in request_irq() for easier profiling
(Kishon Vijay Abraham I)
- Reduce log level of -EPROBE_DEFER error messages to debug (Thierry
Reding)
* remotes/lorenzo/pci/endpoint:
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI: tegra: Print -EPROBE_DEFER error message at debug level
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
tools: PCI: Add 'e' to clear IRQ
misc: pci_endpoint_test: Add ioctl to clear IRQ
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
misc: pci_endpoint_test: Add support to get DMA option from userspace
tools: PCI: Add 'd' command line option to support DMA
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
PCI: endpoint: functions/pci-epf-test: Print throughput information
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
PCI: endpoint: Fix clearing start entry in configfs
PCI: tegra: Add support for PCIe endpoint mode in Tegra194
dt-bindings: PCI: tegra: Add DT support for PCIe EP nodes in Tegra194
soc/tegra: bpmp: Update ABI header
PCI: pci-epf-test: Add support to defer core initialization
PCI: dwc: Add API to notify core initialization completion
PCI: endpoint: Add notification for core init completion
PCI: dwc: Refactor core initialization code for EP mode
PCI: endpoint: Add core init notifying feature
PCI: endpoint: Assign function number for each PF in EPC core
PCI: endpoint: Protect concurrent access to pci_epf_ops with mutex
PCI: endpoint: Fix for concurrent memory allocation in OB address region
PCI: endpoint: Replace spinlock with mutex
PCI: endpoint: Use notification chain mechanism to notify EPC events to EPF
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:53 +0000 (14:26 -0500)]
Merge branch 'remotes/lorenzo/pci/dwc'
- Fix dra7xx issue with missing an MSI if new events pended during IRQ
handler (Vignesh Raghavendra)
* remotes/lorenzo/pci/dwc:
PCI: dwc: pci-dra7xx: Fix MSI IRQ handling
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:50 +0000 (14:26 -0500)]
Merge branch 'remotes/lorenzo/pci/dt'
- Add common schema for PCI endpoint controllers (Kishon Vijay Abraham I)
- Add host and endpoint schemas for Cadence PCIe core (Kishon Vijay
Abraham I)
- Convert Cadence platform bindings to DT schema (Kishon Vijay Abraham I)
* remotes/lorenzo/pci/dt:
dt-bindings: PCI: Convert PCIe Host/Endpoint in Cadence platform to DT schema
dt-bindings: PCI: cadence: Add PCIe RC/EP DT schema for Cadence PCIe
dt-bindings: PCI: Add PCI Endpoint Controller Schema
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:47 +0000 (14:26 -0500)]
Merge branch 'remotes/lorenzo/pci/amlogic'
- Add Amlogic AXG MIPI/PCIe PHY driver and related DT bindings (Remi
Pommarel)
- Use shared PHY driver for Amlogic AXG and G12A platforms (Remi
Pommarel)
* remotes/lorenzo/pci/amlogic:
PCI: amlogic: Use AXG PCIE
phy: amlogic: Add Amlogic AXG PCIE PHY Driver
phy: amlogic: Add Amlogic AXG MIPI/PCIE analog PHY Driver
dt-bindings: PCI: meson: Update PCIE bindings documentation
dt-bindings: Add AXG shared MIPI/PCIE analog PHY bindings
dt-bindings: Add AXG PCIE PHY bindings
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:45 +0000 (14:26 -0500)]
Merge branch 'pci/virtualization'
- Add ACS quirks for Zhaoxin Root Ports, Downstream Ports, and
multi-function devices (Raymond Pang)
* pci/virtualization:
PCI: Add ACS quirk for Zhaoxin Root/Downstream Ports
PCI: Add ACS quirk for Zhaoxin multi-function devices
PCI: Add Zhaoxin Vendor ID
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:43 +0000 (14:26 -0500)]
Merge branch 'pci/resource'
- Use ioremap(), not phys_to_virt() for platform ROM, to fix video ROM
mapping with CONFIG_HIGHMEM (Mikel Rychliski)
- Add support for root bus sizing so we don't have to assume host bridge
windows are known a priori (Ivan Kokshaysky)
- Fix alpha Nautilus PCI setup, which has been broken since we started
enforcing window limits in resource allocation (Ivan Kokshaysky)
* pci/resource:
alpha: Fix nautilus PCI setup
PCI: Add support for root bus sizing
PCI: Use ioremap(), not phys_to_virt() for platform ROM
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:41 +0000 (14:26 -0500)]
Merge branch 'pci/p2pdma'
- Add Intel Sky Lake-E Root Ports B, C, D to P2PDMA whitelist (Andrew
Maier)
* pci/p2pdma:
PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to the whitelist
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:38 +0000 (14:26 -0500)]
Merge branch 'pci/misc'
- Move _HPX type array from stack to static data (Colin Ian King)
- Avoid an ASMedia XHCI USB PME# defect; apparently it doesn't assert
PME# when USB3.0 devices are hotplugged in D0 (Kai-Heng Feng)
- Revert sysfs "rescan" file renames that broke an application (Kelsey
Skunberg)
* pci/misc:
PCI: sysfs: Revert "rescan" file renames
PCI: Avoid ASMedia XHCI USB PME# from D0 defect
PCI/ACPI: Move pcie_to_hpx3_type[] from stack to static data
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:36 +0000 (14:26 -0500)]
Merge branch 'pci/interrupts'
- Extend boot interrupt quirk to cover several Xeon chipsets (Sean V
Kelley)
- Add documentation about boot interrupts (Sean V Kelley)
* pci/interrupts:
Documentation: PCI: Add background on Boot Interrupts
PCI: Add boot interrupt quirk mechanism for Xeon chipsets
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:35 +0000 (14:26 -0500)]
Merge branch 'pci/hotplug'
- Disable in-band presence detection when possible (Alexandru Gagniuc)
- Poll for presence detect if in-band presence detection is disabled
(Alexandru Gagniuc)
- Add DMI table of systems that don't support in-band presence detection
(Stuart Hayes)
- Fix indefinite pciehp wait caused by race in handling sysfs requests
(Lukas Wunner)
- Fix pciehp MSI interrupt race that caused us to miss interrupts (Stuart
Hayes)
* pci/hotplug:
PCI: pciehp: Fix MSI interrupt race
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI: pciehp: Add DMI table for in-band presence detection disabled
PCI: pciehp: Wait for PDS if in-band presence is disabled
PCI: pciehp: Disable in-band presence detect when possible
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:32 +0000 (14:26 -0500)]
Merge branch 'pci/enumeration'
- Add PCIe 32 GT/s speed decoding for sysfs "max_link_speed" and dmesg
notes about available bandwidth (Yicong Yang)
- Simplify and unify PCI bus/link speed reporting (Yicong Yang)
* pci/enumeration:
PCI: Add PCIE_LNKCAP2_SLS2SPEED() macro
PCI: Use pci_speed_string() for all PCI/PCI-X/PCIe strings
PCI: Add pci_speed_string()
PCI: Add 32 GT/s decoding in some macros
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:30 +0000 (14:26 -0500)]
Merge branch 'pci/edr'
- Update error status after reset_link() so we don't report "recovery
failed" when it in fact succeeded (Kuppuswamy Sathyanarayanan)
- Move DPC data into struct pci_dev instead of allocating a separate
struct dpc_dev (Bjorn Helgaas)
- Remove AER/DPC service dependency to simplify error recovery
(Kuppuswamy Sathyanarayanan)
- Return error recovery status for future use by EDR, which needs to tell
firmware whether recovery was successful (Kuppuswamy Sathyanarayanan)
- Cache DPC capability info in core since it's needed by EDR as well as
DPC driver (Kuppuswamy Sathyanarayanan)
- Add pci_aer_raw_clear_status() to allow EDR recovery path to clear AER
status even when OS doesn't own the AER capability (Kuppuswamy
Sathyanarayanan)
- Add Error Disconnect Recover (EDR) support, so firmware can use ACPI
notification to tell the OS that devices have been disconnected, e.g.,
via DPC, and that OS should attempt recovery (Kuppuswamy
Sathyanarayanan)
- Rename AER error status clearing interfaces to be more consistent
(Kuppuswamy Sathyanarayanan)
* pci/edr:
PCI/AER: Rationalize error status register clearing
PCI/DPC: Add Error Disconnect Recover (EDR) support
PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR
PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status
PCI/DPC: Cache DPC capabilities in pci_init_capabilities()
PCI/ERR: Return status of pcie_do_recovery()
PCI/ERR: Remove service dependency in pcie_do_recovery()
PCI/DPC: Move DPC data into struct pci_dev
PCI/ERR: Update error status after reset_link()
PCI/ERR: Combine pci_channel_io_frozen cases
Bjorn Helgaas [Thu, 2 Apr 2020 19:26:27 +0000 (14:26 -0500)]
Merge branch 'pci/aspm'
- Clear the correct bits when enabling ASPM L1 substates (Yicong Yang)
- Reduce severity of ASPM common clock config message (Chris Packham)
* pci/aspm:
PCI/ASPM: Reduce severity of common clock config message
PCI/ASPM: Clear the correct bits when enabling L1 substates
Qian Cai [Thu, 2 Apr 2020 15:39:55 +0000 (11:39 -0400)]
x86/kvm: fix a missing-prototypes "vmread_error"
The commit
842f4be95899 ("KVM: VMX: Add a trampoline to fix VMREAD error
handling") removed the declaration of vmread_error() causes a W=1 build
failure with KVM_WERROR=y. Fix it by adding it back.
arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes]
asmlinkage void vmread_error(unsigned long field, bool fault)
^~~~~~~~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Message-Id: <
20200402153955.1695-1-cai@lca.pw>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Russell King [Thu, 2 Apr 2020 18:42:29 +0000 (19:42 +0100)]
Merge branches 'misc' and 'devel-stable' into for-linus
Linus Torvalds [Thu, 2 Apr 2020 18:22:17 +0000 (11:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull exec/proc updates from Eric Biederman:
"This contains two significant pieces of work: the work to sort out
proc_flush_task, and the work to solve a deadlock between strace and
exec.
Fixing proc_flush_task so that it no longer requires a persistent
mount makes improvements to proc possible. The removal of the
persistent mount solves an old regression that that caused the hidepid
mount option to only work on remount not on mount. The regression was
found and reported by the Android folks. This further allows Alexey
Gladkov's work making proc mount options specific to an individual
mount of proc to move forward.
The work on exec starts solving a long standing issue with exec that
it takes mutexes of blocking userspace applications, which makes exec
extremely deadlock prone. For the moment this adds a second mutex with
a narrower scope that handles all of the easy cases. Which makes the
tricky cases easy to spot. With a little luck the code to solve those
deadlocks will be ready by next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
signal: Extend exec_id to 64bits
pidfd: Use new infrastructure to fix deadlocks in execve
perf: Use new infrastructure to fix deadlocks in execve
proc: io_accounting: Use new infrastructure to fix deadlocks in execve
proc: Use new infrastructure to fix deadlocks in execve
kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
kernel: doc: remove outdated comment cred.c
mm: docs: Fix a comment in process_vm_rw_core
selftests/ptrace: add test cases for dead-locks
exec: Fix a deadlock in strace
exec: Add exec_update_mutex to replace cred_guard_mutex
exec: Move exec_mmap right after de_thread in flush_old_exec
exec: Move cleanup of posix timers on exec out of de_thread
exec: Factor unshare_sighand out of de_thread and call it separately
exec: Only compute current once in flush_old_exec
pid: Improve the comment about waiting in zap_pid_ns_processes
proc: Remove the now unnecessary internal mount of proc
uml: Create a private mount of proc for mconsole
uml: Don't consult current to find the proc_mnt in mconsole_proc
proc: Use a list of inodes to flush from proc
...
Lad Prabhakar [Sat, 21 Mar 2020 11:21:39 +0000 (11:21 +0000)]
misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
PCI_ENDPOINT_TEST_STATUS is already defined in pci_endpoint_test.c along
with the status bits, drop the duplicate definition.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Thierry Reding [Thu, 19 Mar 2020 13:12:30 +0000 (14:12 +0100)]
PCI: tegra: Print -EPROBE_DEFER error message at debug level
Probe deferral is an expected error condition that will usually be
recovered from. Print such error messages at debug level to make them
available for diagnostic purposes when building with debugging enabled
and hide them otherwise to not spam the kernel log with them.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Tested-by: Vidya Sagar <vidyas@nvidia.com>
Kishon Vijay Abraham I [Tue, 17 Mar 2020 10:01:58 +0000 (15:31 +0530)]
misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
Use full pci-endpoint-test name in request_irq(), so that it's easy to
profile the device that actually raised the interrupt.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Tue, 17 Mar 2020 10:01:57 +0000 (15:31 +0530)]
misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
Adding more than 10 pci-endpoint-test devices results in
"kobject_add_internal failed for pci-endpoint-test.1 with -EEXIST, don't
try to register things with the same name in the same directory". This
is because commit
2c156ac71c6b ("misc: Add host side PCI driver for PCI
test function device") limited the length of the "name" to 20 characters.
Change the length of the name to 24 in order to support upto 10000
pci-endpoint-test devices.
Fixes: 2c156ac71c6b ("misc: Add host side PCI driver for PCI test function device")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # v4.14+
Kishon Vijay Abraham I [Tue, 17 Mar 2020 10:01:56 +0000 (15:31 +0530)]
tools: PCI: Add 'e' to clear IRQ
Add a new command line option 'e' to invoke "PCITEST_CLEAR_IRQ"
ioctl. This can be used to clear the irqs set using the 'i' option.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Tue, 17 Mar 2020 10:01:55 +0000 (15:31 +0530)]
misc: pci_endpoint_test: Add ioctl to clear IRQ
Add ioctl to clear IRQ which can be used to free the allocated
IRQ vectors and free the requested IRQ.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Tue, 17 Mar 2020 10:01:54 +0000 (15:31 +0530)]
misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
commit
e03327122e2c ("pci_endpoint_test: Add 2 ioctl commands")
uses module parameter 'irqtype' in pci_endpoint_test_set_irq()
to check if IRQ vectors of a particular type (MSI or MSI-X or
LEGACY) is already allocated. However with multi-function devices,
'irqtype' will not correctly reflect the IRQ type of the PCI device.
Fix it here by adding 'irqtype' for each PCI device to show the
IRQ type of a particular PCI device.
Fixes: e03327122e2c ("pci_endpoint_test: Add 2 ioctl commands")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # v4.19+
Kishon Vijay Abraham I [Tue, 25 Feb 2020 08:17:03 +0000 (13:47 +0530)]
PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
AM654 PCIe EP controller has MSI-X capability register and has the
ability to raise MSI-X interrupt. Add support in pci-keystone.c
for PCIe endpoint controller in AM654 to raise MSI-X interrupts.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Tue, 25 Feb 2020 08:17:02 +0000 (13:47 +0530)]
PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
commit
beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler"),
in order to raise MSI-X interrupt, obtained MSIX table address from
Base Address Register (BAR). However BAR only holds PCI address
programmed by the host whereas the MSI-X table should be in the local
memory.
Store the MSI-X table address (virtual address) as part of ->set_bar()
callback and use that to get the message address and message data
here.
Fixes: beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Tue, 25 Feb 2020 08:17:01 +0000 (13:47 +0530)]
PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
commit
8963106eabdc ("PCI: endpoint: Add MSI-X interfaces") while
adding support to raise MSI-X interrupts from endpoint didn't include
BAR Indicator register (BIR) configuration and MSI-X table offset as
arguments in pci_epc_set_msix(). This would result in endpoint
controller register using random BAR indicator register, the memory
for which might not be allocated by the endpoint function driver.
Add BAR indicator register and MSI-X table offset as arguments in
pci_epc_set_msix() and allocate space for MSI-X table and pending
bit array (PBA) in pci-epf-test endpoint function driver.
Fixes: 8963106eabdc ("PCI: endpoint: Add MSI-X interfaces")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kishon Vijay Abraham I [Mon, 16 Mar 2020 11:24:24 +0000 (16:54 +0530)]
misc: pci_endpoint_test: Add support to get DMA option from userspace
'pcitest' utility now uses '-d' option to allow the user to test DMA.
Add support to parse option to use DMA from userspace application.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
Kishon Vijay Abraham I [Mon, 16 Mar 2020 11:24:23 +0000 (16:54 +0530)]
tools: PCI: Add 'd' command line option to support DMA
Add a new command line option 'd' to use DMA for data transfers.
It should be used with read, write or copy commands.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
Kishon Vijay Abraham I [Mon, 16 Mar 2020 11:24:22 +0000 (16:54 +0530)]
misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
Use streaming DMA APIs (dma_map_single/dma_unmap_single) for buffers
transmitted/received by the endpoint device instead of allocating
a coherent memory. Also add default_data to set the alignment to
4KB since dma_map_single might not return a 4KB aligned address.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
Kishon Vijay Abraham I [Mon, 16 Mar 2020 11:24:21 +0000 (16:54 +0530)]
PCI: endpoint: functions/pci-epf-test: Print throughput information
Print throughput information in KB/s after every completed transfer,
including information on whether DMA is used or not.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
Kishon Vijay Abraham I [Mon, 16 Mar 2020 11:24:20 +0000 (16:54 +0530)]
PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
Use dmaengine API and add support for transferring data using DMA.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Alan Mikhak <alan.mikhak@sifive.com>
Matthew Wilcox (Oracle) [Thu, 2 Apr 2020 04:11:58 +0000 (21:11 -0700)]
include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
It's even more important to check that we don't have a tail page when
calling hpage_nr_pages() when THP are disabled.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: http://lkml.kernel.org/r/20200318140253.6141-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christophe Leroy [Thu, 2 Apr 2020 04:11:54 +0000 (21:11 -0700)]
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the following
build failure is encoutered:
In file included from arch/powerpc/mm/fault.c:33:0:
include/linux/hugetlb.h: In function 'hstate_inode':
include/linux/hugetlb.h:477:9: error: implicit declaration of function 'HUGETLBFS_SB' [-Werror=implicit-function-declaration]
return HUGETLBFS_SB(i->i_sb)->hstate;
^
include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 'int')
return HUGETLBFS_SB(i->i_sb)->hstate;
^
Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE.
Fixes: a137e1cc6d6e ("hugetlbfs: per mount huge page sizes")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Link: http://lkml.kernel.org/r/7e8c3a3c9a587b9cd8a2f146df32a421b961f3a2.1584432148.git.christophe.leroy@c-s.fr
Link: https://patchwork.ozlabs.org/patch/1255548/#2386036
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christophe Leroy [Thu, 2 Apr 2020 04:11:51 +0000 (21:11 -0700)]
selftests/vm: fix map_hugetlb length used for testing read and write
Commit
fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page
size in map_hugetlb") added the possibility to change the size of memory
mapped for the test, but left the read and write test using the default
value. This is unnoticed when mapping a length greater than the default
one, but segfaults otherwise.
Fix read_bytes() and write_bytes() by giving them the real length.
Also fix the call to munmap().
Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/9a404a13c871c4bd0ba9ede68f69a1225180dd7e.1580978385.git.christophe.leroy@c-s.fr
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 2 Apr 2020 04:11:48 +0000 (21:11 -0700)]
mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
Commit
f1e61557f023 ("mm: pack compound_dtor and compound_order into one
word in struct page") changed compound_dtor from a pointer to an array
index in order to pack it. To check if page has the hugeltbfs
compound_dtor, we can just compare the index directly without fetching the
function pointer. Said commit did that with PageHuge() and we can do the
same with PageHeadHuge() to make the code a bit smaller and faster.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Neha Agarwal <nehaagarwal@google.com>
Link: http://lkml.kernel.org/r/20200311172440.6988-1-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mateusz Nosek [Thu, 2 Apr 2020 04:11:45 +0000 (21:11 -0700)]
mm/hugetlb.c: clean code by removing unnecessary initialization
Previously variable 'check_addr' was initialized, but was not read later
before reassigning. So the initialization can be removed.
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Link: http://lkml.kernel.org/r/20200303212354.25226-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:41 +0000 (21:11 -0700)]
hugetlb_cgroup: add hugetlb_cgroup reservation docs
Add docs for how to use hugetlb_cgroup reservations, and their behavior.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-9-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:38 +0000 (21:11 -0700)]
hugetlb_cgroup: add hugetlb_cgroup reservation tests
The tests use both shared and private mapped hugetlb memory, and monitors
the hugetlb usage counter as well as the hugetlb reservation counter.
They test different configurations such as hugetlb memory usage via
hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without
MAP_POPULATE.
Also add test for hugetlb reservation reparenting, since this is a subtle
issue.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sandipan Das <sandipan@linux.ibm.com> [powerpc64]
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-8-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:35 +0000 (21:11 -0700)]
hugetlb: support file_region coalescing again
An earlier patch in this series disabled file_region coalescing in order
to hang the hugetlb_cgroup uncharge info on the file_region entries.
This patch re-adds support for coalescing of file_region entries.
Essentially everytime we add an entry, we call a recursive function that
tries to coalesce the added region with the regions next to it. The worst
case call depth for this function is 3: one to coalesce with the region
next to it, one to coalesce to the region prev, and one to reach the base
case.
This is an important performance optimization as private mappings add
their entries page by page, and we could incur big performance costs for
large mappings with lots of file_region entries in their resv_map.
[almasrymina@google.com: fix CONFIG_CGROUP_HUGETLB ifdefs]
Link: http://lkml.kernel.org/r/20200214204544.231482-1-almasrymina@google.com
[almasrymina@google.com: remove check_coalesce_bug debug code]
Link: http://lkml.kernel.org/r/20200219233610.13808-1-almasrymina@google.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-7-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:31 +0000 (21:11 -0700)]
hugetlb_cgroup: support noreserve mappings
Support MAP_NORESERVE accounting as part of the new counter.
For each hugepage allocation, at allocation time we check if there is a
reservation for this allocation or not. If there is a reservation for
this allocation, then this allocation was charged at reservation time, and
we don't re-account it. If there is no reserevation for this allocation,
we charge the appropriate hugetlb_cgroup.
The hugetlb_cgroup to uncharge for this allocation is stored in
page[3].private. We use new APIs added in an earlier patch to set this
pointer.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-6-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:28 +0000 (21:11 -0700)]
hugetlb_cgroup: add accounting for shared mappings
For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
in the resv_map entries, in file_region->reservation_counter.
After a call to region_chg, we charge the approprate hugetlb_cgroup, and
if successful, we pass on the hugetlb_cgroup info to a follow up
region_add call. When a file_region entry is added to the resv_map via
region_add, we put the pointer to that cgroup in
file_region->reservation_counter. If charging doesn't succeed, we report
the error to the caller, so that the kernel fails the reservation.
On region_del, which is when the hugetlb memory is unreserved, we also
uncharge the file_region->reservation_counter.
[akpm@linux-foundation.org: forward declare struct file_region]
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-5-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:25 +0000 (21:11 -0700)]
hugetlb: disable region_add file_region coalescing
A follow up patch in this series adds hugetlb cgroup uncharge info the
file_region entries in resv->regions. The cgroup uncharge info may differ
for different regions, so they can no longer be coalesced at region_add
time. So, disable region coalescing in region_add in this patch.
Behavior change:
Say a resv_map exists like this [0->1], [2->3], and [5->6].
Then a region_chg/add call comes in region_chg/add(f=0, t=5).
Old code would generate resv->regions: [0->5], [5->6].
New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5],
[5->6].
Special care needs to be taken to handle the resv->adds_in_progress
variable correctly. In the past, only 1 region would be added for every
region_chg and region_add call. But now, each call may add multiple
regions, so we can no longer increment adds_in_progress by 1 in
region_chg, or decrement adds_in_progress by 1 after region_add or
region_abort. Instead, region_chg calls add_reservation_in_range() to
count the number of regions needed and allocates those, and that info is
passed to region_add and region_abort to decrement adds_in_progress
correctly.
We've also modified the assumption that region_add after region_chg never
fails. region_chg now pre-allocates at least 1 region for region_add. If
region_add needs more regions than region_chg has allocated for it, then
it may fail.
[almasrymina@google.com: fix file_region entry allocations]
Link: http://lkml.kernel.org/r/20200219012736.20363-1-almasrymina@google.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Link: http://lkml.kernel.org/r/20200211213128.73302-4-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:21 +0000 (21:11 -0700)]
hugetlb_cgroup: add reservation accounting for private mappings
Normally the pointer to the cgroup to uncharge hangs off the struct page,
and gets queried when it's time to free the page. With hugetlb_cgroup
reservations, this is not possible. Because it's possible for a page to
be reserved by one task and actually faulted in by another task.
The best place to put the hugetlb_cgroup pointer to uncharge for
reservations is in the resv_map. But, because the resv_map has different
semantics for private and shared mappings, the code patch to
charge/uncharge shared and private mappings is different. This patch
implements charging and uncharging for private mappings.
For private mappings, the counter to uncharge is in
resv_map->reservation_counter. On initializing the resv_map this is set
to NULL. On reservation of a region in private mapping, the tasks
hugetlb_cgroup is charged and the hugetlb_cgroup is placed is
resv_map->reservation_counter.
On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter.
[akpm@linux-foundation.org: forward declare struct resv_map]
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-3-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:18 +0000 (21:11 -0700)]
mm/hugetlb_cgroup: fix hugetlb_cgroup migration
Commit
c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge
hugetlb reservations") mistakingly doesn't handle the migration of *both*
the reservation hugetlb_cgroup and the fault hugetlb_cgroup correctly.
What should happen is that both cgroups shuold be queried from the old
page, then both set to NULL on the old page, then both inserted into the
new page.
The mistake also creates the following warning:
mm/hugetlb_cgroup.c: In function 'hugetlb_cgroup_migrate':
mm/hugetlb_cgroup.c:777:25: warning: variable 'h_cg' set but not used
[-Wunused-but-set-variable]
struct hugetlb_cgroup *h_cg;
^~~~
Solution is to add the missing steps, namly setting the reservation
hugetlb_cgroup to NULL on the old page, and setting the fault
hugetlb_cgroup on the new page.
Fixes: c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200218194727.46995-1-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:15 +0000 (21:11 -0700)]
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage
or hugetlb reservation counter.
Adds a new interface to uncharge a hugetlb_cgroup counter via
hugetlb_cgroup_uncharge_counter.
Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init,
hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200211213128.73302-2-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Thu, 2 Apr 2020 04:11:11 +0000 (21:11 -0700)]
hugetlb_cgroup: add hugetlb_cgroup reservation counter
These counters will track hugetlb reservations rather than hugetlb memory
faulted in. This patch only adds the counter, following patches add the
charging and uncharging of the counter.
This is patch 1 of an 9 patch series.
Problem:
Currently tasks attempting to reserve more hugetlb memory than is
available get a failure at mmap/shmget time. This is thanks to Hugetlbfs
Reservations [1]. However, if a task attempts to reserve more hugetlb
memory than its hugetlb_cgroup limit allows, the kernel will allow the
mmap/shmget call, but will SIGBUS the task when it attempts to fault in
the excess memory.
We have users hitting their hugetlb_cgroup limits and thus we've been
looking at this failure mode. We'd like to improve this behavior such
that users violating the hugetlb_cgroup limits get an error on mmap/shmget
time, rather than getting SIGBUS'd when they try to fault the excess
memory in. This gives the user an opportunity to fallback more gracefully
to non-hugetlbfs memory for example.
The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time. Thus,
enforcing the hugetlb_cgroup limit only happens at fault time, and the
offending task gets SIGBUS'd.
Proposed Solution:
A new page counter named
'hugetlb.xMB.rsvd.[limit|usage|max_usage]_in_bytes'. This counter has
slightly different semantics than
'hugetlb.xMB.[limit|usage|max_usage]_in_bytes':
- While usage_in_bytes tracks all *faulted* hugetlb memory,
rsvd.usage_in_bytes tracks all *reserved* hugetlb memory and hugetlb
memory faulted in without a prior reservation.
- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than rsvd.limit_in_bytes, the kernel will fail this
reservation.
This proposal is implemented in this patch series, with tests to verify
functionality and show the usage.
Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to the
existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
duplication with hugetlb_cgroup. Keeping hugetlb related page counters
under hugetlb_cgroup seemed cleaner as well.
2. Instead of adding a new counter, we considered adding a sysctl that
modifies the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do
accounting at reservation time rather than fault time. Adding a new
page_counter seems better as userspace could, if it wants, choose to
enforce different cgroups differently: one via limit_in_bytes, and
another via rsvd.limit_in_bytes. This could be very useful if you're
transitioning how hugetlb memory is partitioned on your system one
cgroup at a time, for example. Also, someone may find usage for both
limit_in_bytes and rsvd.limit_in_bytes concurrently, and this approach
gives them the option to do so.
Testing:
- Added tests passing.
- Used libhugetlbfs for regression testing.
[1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200211213128.73302-1-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Thu, 2 Apr 2020 04:11:08 +0000 (21:11 -0700)]
hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
hugetlbfs page faults can race with truncate and hole punch operations.
Current code in the page fault path attempts to handle this by 'backing
out' operations if we encounter the race. One obvious omission in the
current code is removing a page newly added to the page cache. This is
pretty straight forward to address, but there is a more subtle and
difficult issue of backing out hugetlb reservations. To handle this
correctly, the 'reservation state' before page allocation needs to be
noted so that it can be properly backed out. There are four distinct
possibilities for reservation state: shared/reserved, shared/no-resv,
private/reserved and private/no-resv. Backing out a reservation may
require memory allocation which could fail so that needs to be taken
into account as well.
Instead of writing the required complicated code for this rare
occurrence, just eliminate the race. i_mmap_rwsem is now held in read
mode for the duration of page fault processing. Hold i_mmap_rwsem in
write mode when modifying i_size. In this way, truncation can not
proceed when page faults are being processed. In addition, i_size
will not change during fault processing so a single check can be made
to ensure faults are not beyond (proposed) end of file. Faults can
still race with hole punch, but that race is handled by existing code
and the use of hugetlb_fault_mutex.
With this modification, checks for races with truncation in the page
fault path can be simplified and removed. remove_inode_hugepages no
longer needs to take hugetlb_fault_mutex in the case of truncation.
Comments are expanded to explain reasoning behind locking.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Thu, 2 Apr 2020 04:11:05 +0000 (21:11 -0700)]
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.
While discussing the issue with huge_pte_offset [1], I remembered that
there were more outstanding hugetlb races. These issues are:
1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
invalid via a call to huge_pmd_unshare by another thread.
2) hugetlbfs page faults can race with truncation causing invalid global
reserve counts and state.
A previous attempt was made to use i_mmap_rwsem in this manner as
described at [2]. However, those patches were reverted starting with [3]
due to locking issues.
To effectively use i_mmap_rwsem to address the above issues it needs to be
held (in read mode) during page fault processing. However, during fault
processing we need to lock the page we will be adding. Lock ordering
requires we take page lock before i_mmap_rwsem. Waiting until after
taking the page lock is too late in the fault process for the
synchronization we want to do.
To address this lock ordering issue, the following patches change the lock
ordering for hugetlb pages. This is not too invasive as hugetlbfs
processing is done separate from core mm in many places. However, I don't
really like this idea. Much ugliness is contained in the new routine
hugetlb_page_mapping_lock_write() of patch 1.
The only other way I can think of to address these issues is by catching
all the races. After catching a race, cleanup, backout, retry ... etc,
as needed. This can get really ugly, especially for huge page
reservations. At one time, I started writing some of the reservation
backout code for page faults and it got so ugly and complicated I went
down the path of adding synchronization to avoid the races. Any other
suggestions would be welcome.
[1] https://lore.kernel.org/linux-mm/
1582342427-230392-1-git-send-email-longpeng2@huawei.com/
[2] https://lore.kernel.org/linux-mm/
20181222223013.22193-1-mike.kravetz@oracle.com/
[3] https://lore.kernel.org/linux-mm/
20190103235452.29335-1-mike.kravetz@oracle.com
[4] https://lore.kernel.org/linux-mm/
1584028670.7365.182.camel@lca.pw/
[5] https://lore.kernel.org/lkml/
20200312183142.
108df9ac@canb.auug.org.au/
This patch (of 2):
While looking at BUGs associated with invalid huge page map counts, it was
discovered and observed that a huge pte pointer could become 'invalid' and
point to another task's page table. Consider the following:
A task takes a page fault on a shared hugetlbfs file and calls
huge_pte_alloc to get a ptep. Suppose the returned ptep points to a
shared pmd.
Now, another task truncates the hugetlbfs file. As part of truncation, it
unmaps everyone who has the file mapped. If the range being truncated is
covered by a shared pmd, huge_pmd_unshare will be called. For all but the
last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
to the pmd. If the task in the middle of the page fault is not the last
user, the ptep returned by huge_pte_alloc now points to another task's
page table or worse. This leads to bad things such as incorrect page
map/reference counts or invalid memory references.
To fix, expand the use of i_mmap_rwsem as follows:
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
huge_pmd_share is only called via huge_pte_alloc, so callers of
huge_pte_alloc take i_mmap_rwsem before calling. In addition, callers
of huge_pte_alloc continue to hold the semaphore until finished with
the ptep.
- i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
One problem with this scheme is that it requires taking i_mmap_rwsem
before taking the page lock during page faults. This is not the order
specified in the rest of mm code. Handling of hugetlbfs pages is mostly
isolated today. Therefore, we use this alternative locking order for
PageHuge() pages.
mapping->i_mmap_rwsem
hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
page->flags PG_locked (lock_page)
To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
introduced to write lock the i_mmap_rwsem associated with a page.
In most cases it is easy to get address_space via vma->vm_file->f_mapping.
However, in the case of migration or memory errors for anon pages we do
not have an associated vma. A new routine _get_hugetlb_page_mapping()
will use anon_vma to get address_space in these cases.
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Thu, 2 Apr 2020 04:11:01 +0000 (21:11 -0700)]
mm/memblock.c: remove redundant assignment to variable max_addr
The variable max_addr is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200228235003.112718-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 2 Apr 2020 04:10:58 +0000 (21:10 -0700)]
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
Using an empty (malformed) nodelist that is not caught during mount option
parsing leads to a stack-out-of-bounds access.
The option string that was used was: "mpol=prefer:,". However,
MPOL_PREFERRED requires a single node number, which is not being provided
here.
Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's
nodeid.
Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display")
Reported-by: Entropy Moe <3ntr0py1337@gmail.com>
Reported-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Link: http://lkml.kernel.org/r/89526377-7eb6-b662-e1d8-4430928abde9@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Thu, 2 Apr 2020 04:10:55 +0000 (21:10 -0700)]
mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
The VM_BUG_ON() is already used by queue_pages_test_walk(), it sounds
better to dump more debug information by using VM_BUG_ON_VMA() to help
debugging.
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Li Xinhai" <lixinhai.lxh@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/1579068565-110432-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Xinhai [Thu, 2 Apr 2020 04:10:52 +0000 (21:10 -0700)]
mm/mempolicy: check hugepage migration is supported by arch in vma_migratable()
vma_migratable() is called to check if pages in vma can be migrated before
go ahead to further actions. Currently it is used in below code path:
- task_numa_work
- mbind
- move_pages
For hugetlb mapping, whether vma is migratable or not is determined by:
- CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
- arch_hugetlb_migration_supported
Issue: current code only checks for CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
alone, and no code should use it directly. (note that current code in
vma_migratable don't cause failure or bug because
unmap_and_move_huge_page() will catch unsupported hugepage and handle it
properly)
This patch checks the two factors by hugepage_migration_supported for
impoving code logic and robustness. It will enable early bail out of
hugepage migration procedure, but because currently all architecture
supporting hugepage migration is able to support all page size, we would
not see performance gain with this patch applied.
vma_migratable() is moved to mm/mempolicy.c, because of the circular
reference of mempolicy.h and hugetlb.h cause defining it as inline not
feasible.
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Link: http://lkml.kernel.org/r/1579786179-30633-1-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Xinhai [Thu, 2 Apr 2020 04:10:48 +0000 (21:10 -0700)]
mm/mempolicy: support MPOL_MF_STRICT for huge page mapping
MPOL_MF_STRICT is used in mbind() for purposes:
(1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL, to check if there is misplaced page and return -EIO;
(2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to
check if there is misplaced page which is failed to isolate, or page
is success on isolate but failed to move, and return -EIO.
For non hugepage mapping, (1) and (2) are implemented as expectation. For
hugepage mapping, (1) is not implemented. And in (2), the part about
failed to isolate and report -EIO is not implemented.
This patch implements the missed parts for hugepage mapping. Benefits
with it applied:
- User space can apply same code logic to handle mbind() on hugepage and
non hugepage mapping;
- Reliably using MPOL_MF_STRICT alone to check whether there is
misplaced page or not when bind policy on address range, especially for
address range which contains both hugepage and non hugepage mapping.
Analysis of potential impact to existing users:
- If MPOL_MF_STRICT alone was previously used, hugetlb pages not
following the memory policy would not cause an EIO error. After this
change, hugetlb pages are treated like all other pages. If
MPOL_MF_STRICT alone is used and hugetlb pages do not follow memory
policy an EIO error will be returned.
- For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will
not be changed by this patch, because failed to isolate and failed to
move have same effects to users, so their existing code will not be
impacted.
In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page
mappings' can be removed after this patch is applied.
Mike:
: The current behavior with MPOL_MF_STRICT and hugetlb pages is inconsistent
: and does not match documentation (as described above). The special
: behavior for hugetlb pages ideally should have been removed when hugetlb
: page migration was introduced. It is unlikely that anyone relies on
: today's inconsistent behavior, and removing one more case of special
: handling for hugetlb pages is a good thing.
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-man <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/1581559627-6206-1-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mateusz Nosek [Thu, 2 Apr 2020 04:10:45 +0000 (21:10 -0700)]
mm/compaction.c: clean code by removing unnecessary assignment
Previously 0 was assigned to variable 'last_migrated_pfn'. But the
variable is not read after that, so the assignment can be removed.
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: http://lkml.kernel.org/r/20200318174509.15021-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Andrzej Siewior [Thu, 2 Apr 2020 04:10:42 +0000 (21:10 -0700)]
mm/compaction: Disable compact_unevictable_allowed on RT
Since commit
5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages and compact them by default. On
-RT even minor pagefaults are problematic because it may take a few 100us
to resolve them and until then the task is blocked.
Make compact_unevictable_allowed = 0 default and issue a warning on RT if
it is changed.
[bigeasy@linutronix.de: v5]
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Link: http://lkml.kernel.org/r/20200319165536.ovi75tsr2seared4@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Link: http://lkml.kernel.org/r/20200303202225.nhqc3v5gwlb7x6et@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Andrzej Siewior [Thu, 2 Apr 2020 04:10:38 +0000 (21:10 -0700)]
mm/compaction: really limit compact_unevictable_allowed to 0 and 1
The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the
`extra*' attribues have been set properly but without
proc_dointvec_minmax() as the `proc_handler' the limit will not be
enforced.
Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid
specified range.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: http://lkml.kernel.org/r/20200303202054.gsosv7fsx2ma3cic@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 2 Apr 2020 04:10:35 +0000 (21:10 -0700)]
mm, compaction: fully assume capture is not NULL in compact_zone_order()
Dan reports:
The patch
5e1f0f098b46: "mm, compaction: capture a page under direct
compaction" from Mar 5, 2019, leads to the following Smatch complaint:
mm/compaction.c:2321 compact_zone_order()
error: we previously assumed 'capture' could be null (see line 2313)
mm/compaction.c
2288 static enum compact_result compact_zone_order(struct zone *zone, int order,
2289 gfp_t gfp_mask, enum compact_priority prio,
2290 unsigned int alloc_flags, int classzone_idx,
2291 struct page **capture)
^^^^^^^
2313 if (capture)
^^^^^^^
Check for NULL
2314 current->capture_control = &capc;
2315
2316 ret = compact_zone(&cc, &capc);
2317
2318 VM_BUG_ON(!list_empty(&cc.freepages));
2319 VM_BUG_ON(!list_empty(&cc.migratepages));
2320
2321 *capture = capc.page;
^^^^^^^^
Unchecked dereference.
2322 current->capture_control = NULL;
2323
In practice this is not an issue, as the only caller path passes non-NULL
capture:
__alloc_pages_direct_compact()
struct page *page = NULL;
try_to_compact_pages(capture = &page);
compact_zone_order(capture = capture);
So let's remove the unnecessary check, which should also make Smatch happy.
Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: http://lkml.kernel.org/r/18b0df3c-0589-d96c-23fa-040798fee187@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik van Riel [Thu, 2 Apr 2020 04:10:31 +0000 (21:10 -0700)]
mm,thp,compaction,cma: allow THP migration for CMA allocations
The code to implement THP migrations already exists, and the code for CMA
to clear out a region of memory already exists.
Only a few small tweaks are needed to allow CMA to move THP memory when
attempting an allocation from alloc_contig_range.
With these changes, migrating THPs from a CMA area works when allocating a
1GB hugepage from CMA memory.
[riel@surriel.com: fix hugetlbfs pages per Mike, cleanup per Vlastimil]
Link: http://lkml.kernel.org/r/20200228104700.0af2f18d@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200227213238.1298752-2-riel@surriel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik van Riel [Thu, 2 Apr 2020 04:10:28 +0000 (21:10 -0700)]
mm,compaction,cma: add alloc_contig flag to compact_control
Patch series "fix THP migration for CMA allocations", v2.
Transparent huge pages are allocated with __GFP_MOVABLE, and can end up in
CMA memory blocks. Transparent huge pages also have most of the
infrastructure in place to allow migration.
However, a few pieces were missing, causing THP migration to fail when
attempting to use CMA to allocate 1GB hugepages.
With these patches in place, THP migration from CMA blocks seems to work,
both for anonymous THPs and for tmpfs/shmem THPs.
This patch (of 2):
Add information to struct compact_control to indicate that the allocator
would really like to clear out this specific part of memory, used by for
example CMA.
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200227213238.1298752-1-riel@surriel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 2 Apr 2020 04:10:25 +0000 (21:10 -0700)]
selftests: vm: drop dependencies on page flags from mlock2 tests
It was noticed that mlock2 tests are failing after
9c4e6b1a7027f ("mm,
mlock, vmscan: no more skipping pagevecs") because the patch has changed
the timing on when the page is added to the unevictable LRU list and thus
gains the unevictable page flag.
The test was just too dependent on the implementation details which were
true at the time when it was introduced. Page flags and the timing when
they are set is something no userspace should ever depend on. The test
should be testing only for the user observable contract of the tested
syscalls. Those are defined pretty well for the mlock and there are other
means for testing them. In fact this is already done and testing for page
flags can be safely dropped to achieve the aimed purpose. Present bits
can be checked by /proc/<pid>/smaps RSS field and the locking state by
VmFlags although I would argue that Locked: field would be more
appropriate.
Drop all the page flag machinery and considerably simplify the test. This
should be more robust for future kernel changes while checking the
promised contract is still valid.
Fixes: 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs")
Reported-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mateusz Nosek [Thu, 2 Apr 2020 04:10:21 +0000 (21:10 -0700)]
mm/vmscan.c: do_try_to_free_pages(): clean code by removing unnecessary assignment
sc->memcg_low_skipped resets skipped_deactivate to 0 but this is not
needed as this code path is never reachable with skipped_deactivate != 0
due to previous sc->skipped_deactivate branch.
[mhocko@kernel.org: rewrite changelog]
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Link: http://lkml.kernel.org/r/20200319165938.23354-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill Tkhai [Thu, 2 Apr 2020 04:10:18 +0000 (21:10 -0700)]
mm/vmscan.c: make may_enter_fs bool in shrink_page_list()
This gives some size improvement:
$size mm/vmscan.o (before)
text data bss dec hex filename
53670 24123 12 77805 12fed mm/vmscan.o
$size mm/vmscan.o (after)
text data bss dec hex filename
53648 24123 12 77783 12fd7 mm/vmscan.o
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>