Linus Torvalds [Sat, 9 Jun 2018 17:32:39 +0000 (10:32 -0700)]
Merge tag 'staging-4.18-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big staging and IIO driver update for 4.18-rc1.
It was delayed as I wanted to make sure the final driver deletions did
not cause any major merge issues, and all now looks good.
There are a lot of patches here, just over 1000. The diffstat summary
shows the major changes here:
1007 files changed, 16828 insertions(+), 227770 deletions(-)
Because of this, we might be close to shrinking the overall kernel
source code size for two releases in a row.
There was loads of work in this release cycle, primarily:
- tons of ks7010 driver cleanups
- lots of mt7621 driver fixes and cleanups
- most driver cleanups
- wilc1000 fixes and cleanups
- lots and lots of IIO driver cleanups and new additions
- debugfs cleanups for all staging drivers
- lots of other staging driver cleanups and fixes, the shortlog has
the full details.
but the big user-visable things here are the removal of 3 chunks of
code:
- ncpfs and ipx were removed on schedule, no one has cared about this
code since it moved to staging last year, and if it needs to come
back, it can be reverted.
- lustre file system is removed.
I've ranted at the lustre developers about once a year for the past
5 years, with no real forward progress at all to clean things up
and get the code into the "real" part of the kernel.
Given that the lustre developers continue to work on an external
tree and try to port those changes to the in-kernel tree every once
in a while, this whole thing really really is not working out at
all. So I'm deleting it so that the developers can spend the time
working in their out-of-tree location and get things cleaned up
properly to get merged into the tree correctly at a later date.
Because of these file removals, you will have merge issues on some of
these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
atomisp driver). Just delete those files, it's a simple merge :)
All of this has been in linux-next for a while with no reported
problems"
* tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1011 commits)
staging: ipx: delete it from the tree
ncpfs: remove uapi .h files
ncpfs: remove Documentation
ncpfs: remove compat functionality
staging: ncpfs: delete it
staging: lustre: delete the filesystem from the tree.
staging: vc04_services: no need to save the log debufs dentries
staging: vc04_services: vchiq_debugfs_log_entry can be a void *
staging: vc04_services: remove struct vchiq_debugfs_info
staging: vc04_services: move client dbg directory into static variable
staging: vc04_services: remove odd vchiq_debugfs_top() wrapper
staging: vc04_services: no need to check debugfs return values
staging: mt7621-gpio: reorder includes alphabetically
staging: mt7621-gpio: change gc_map to don't use pointers
staging: mt7621-gpio: use GPIOF_DIR_OUT and GPIOF_DIR_IN macros instead of custom values
staging: mt7621-gpio: change 'to_mediatek_gpio' to make just a one line return
staging: mt7621-gpio: dt-bindings: update documentation for #interrupt-cells property
staging: mt7621-gpio: update #interrupt-cells for the gpio node
staging: mt7621-gpio: dt-bindings: complete documentation for the gpio
staging: mt7621-dts: add missing properties to gpio node
...
Linus Torvalds [Sat, 9 Jun 2018 00:21:52 +0000 (17:21 -0700)]
Merge tag 'libnvdimm-for-4.18' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This adds a user for the new 'bytes-remaining' updates to
memcpy_mcsafe() that you already received through Ingo via the
x86-dax- for-linus pull.
Not included here, but still targeting this cycle, is support for
handling memory media errors (poison) consumed via userspace dax
mappings.
Summary:
- DAX broke a fundamental assumption of truncate of file mapped
pages. The truncate path assumed that it is safe to disconnect a
pinned page from a file and let the filesystem reclaim the physical
block. With DAX the page is equivalent to the filesystem block.
Introduce dax_layout_busy_page() to enable filesystems to wait for
pinned DAX pages to be released. Without this wait a filesystem
could allocate blocks under active device-DMA to a new file.
- DAX arranges for the block layer to be bypassed and uses
dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
However, the memcpy_mcsafe() facility is available through the pmem
block driver. In order to safely handle media errors, via the DAX
block-layer bypass, introduce copy_to_iter_mcsafe().
- Fix cache management policy relative to the ACPI NFIT Platform
Capabilities Structure to properly elide cache flushes when they
are not necessary. The table indicates whether CPU caches are
power-fail protected. Clarify that a deep flush is always performed
on REQ_{FUA,PREFLUSH} requests"
* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
dax: Use dax_write_cache* helpers
libnvdimm, pmem: Do not flush power-fail protected CPU caches
libnvdimm, pmem: Unconditionally deep flush on *sync
libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
acpi, nfit: Remove ecc_unit_size
dax: dax_insert_mapping_entry always succeeds
libnvdimm, e820: Register all pmem resources
libnvdimm: Debug probe times
linvdimm, pmem: Preserve read-only setting for pmem devices
x86, nfit_test: Add unit test for memcpy_mcsafe()
pmem: Switch to copy_to_iter_mcsafe()
dax: Report bytes remaining in dax_iomap_actor()
dax: Introduce a ->copy_to_iter dax operation
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
xfs, dax: introduce xfs_break_dax_layouts()
xfs: prepare xfs_break_layouts() for another layout type
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
mm, fs, dax: handle layout changes to pinned dax mappings
mm: fix __gup_device_huge vs unmap
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
...
Dan Williams [Fri, 8 Jun 2018 22:16:44 +0000 (15:16 -0700)]
Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next
Dan Williams [Fri, 8 Jun 2018 22:16:40 +0000 (15:16 -0700)]
Merge branch 'for-4.18/dax' into libnvdimm-for-next
Linus Torvalds [Fri, 8 Jun 2018 20:36:19 +0000 (13:36 -0700)]
Merge tag 'for-linus-
20180608' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes for this merge window, where some of them should go in
sooner rather than later, hence a new pull this week. This pull
request contains:
- Set of NVMe fixes, mostly follow up cleanups/fixes to the queue
changes, but also teardown/removal and misc changes (Christop/Dan/
Johannes/Sagi/Steve).
- Two lightnvm fixes for issues that showed up in this window
(Colin/Wei).
- Failfast/driver flags inheritance for flush requests (Hannes).
- The md device put sanitization and fix (Kent).
- dm bio_set inheritance fix (me).
- nbd discard granularity fix (Josef).
- nbd consistency in command printing (Kevin).
- Loop recursion validation fix (Ted).
- Partition overlap check (Wang)"
[ .. and now my build is warning-free again thanks to the md fix - Linus ]
* tag 'for-linus-
20180608' of git://git.kernel.dk/linux-block: (22 commits)
nvme: cleanup double shift issue
nvme-pci: make CMB SQ mod-param read-only
nvme-pci: unquiesce dead controller queues
nvme-pci: remove HMB teardown on reset
nvme-pci: queue creation fixes
nvme-pci: remove unnecessary completion doorbell check
nvme-pci: remove unnecessary nested locking
nvmet: filter newlines from user input
nvme-rdma: correctly check for target keyed sgl support
nvme: don't hold nvmf_transports_rwsem for more than transport lookups
nvmet: return all zeroed buffer when we can't find an active namespace
md: Unify mddev destruction paths
dm: use bioset_init_from_src() to copy bio_set
block: add bioset_init_from_src() helper
block: always set partition number to '0' in blk_partition_remap()
block: pass failfast and driver-specific flags to flush requests
nbd: set discard_alignment to the granularity
nbd: Consistently use request pointer in debug messages.
block: add verifier for cmdline partition
lightnvm: pblk: fix resource leak of invalid_bitmap
...
Linus Torvalds [Fri, 8 Jun 2018 20:08:57 +0000 (13:08 -0700)]
Merge tag 'regulator-v4.18' of git://git./linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"Quite a lot of core work this time around, though not 100% successful.
We gained support for runtime mode changes thanks to David Collins and
improved support for write only regulators (ones where we can't read
back the configuration) from Douglas Anderson.
There's been quite a bit of work from Linus Walleij on converting from
specfying GPIOs by numbers to descriptors. Sadly the testing turned
out to be less good than we had hoped and so a lot of this had to be
reverted.
We also have the start of updates to use coupled regulators from
Maciej Purski, unfortunately there are further problems there so the
last couple of patches have been reverted.
We also have new drivers for
BD71837 and SY8106A devices, SAW
regulators on Qualcomm SPMI and dropped support for some preproduction
chips that never made it to market from the AB8500 driver"
* tag 'regulator-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (57 commits)
regulator: gpio: Revert
ARM: pxa, regulator: fix building ezx e680
regulator: Revert coupled regulator support again
regulator: wm8994: Fix shared GPIOs
regulator: max77686: Fix shared GPIOs
regulator:
bd71837:
BD71837 PMIC regulator driver
regulator:
bd71837: Devicetree bindings for
BD71837 regulators
regulator: gpio: Get enable GPIO using GPIO descriptor
regulator: fixed: Convert to use GPIO descriptor only
regulator: s2mps11: Fix boot on Odroid XU3
dt-bindings: qcom_spmi: Document SAW support
regulator: qcom_spmi: Add support for SAW
regulator: tps65090: Pass descriptor instead of GPIO number
regulator: s5m8767: Pass descriptor instead of GPIO number
regulator: pfuze100: Delete reference to ena_gpio
regulator: max8952: Pass descriptor instead of GPIO number
regulator: lp8788-ldo: Pass descriptor instead of GPIO number
regulator: lm363x: Pass descriptor instead of GPIO number
regulator: max8973: Pass descriptor instead of GPIO number
regulator: mc13xxx-core: Switch to SPDX identifier
...
Dan Carpenter [Thu, 7 Jun 2018 08:27:41 +0000 (11:27 +0300)]
nvme: cleanup double shift issue
The problem here is that set_bit() and test_bit() take a bit number so
we should be passing 0 but instead we're passing (1 << 0) which leads to
a double shift. It doesn't cause a runtime bug in the current code
because it's done consistently and we only set that one bit.
I decided to just re-use NVME_AER_NOTICE_NS_CHANGED instead of
introducing a new define for this.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:09 +0000 (08:13 -0600)]
nvme-pci: make CMB SQ mod-param read-only
A controller reset after a run time change of the CMB module parameter
breaks the driver. An 'on -> off' will have the driver use NULL for the
host memory queue, and 'off -> on' will use mismatched queue depth between
the device and the host.
We could fix both, but there isn't really a good reason to change this
at run time anyway, compared to at module load time, so this patch makes
parameter read-only after after modprobe.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:08 +0000 (08:13 -0600)]
nvme-pci: unquiesce dead controller queues
This patch ensures the nvme namsepace request queues are not quiesced
on a surprise removal. It's possible the queues were previously killed
in a failed reset, so the queues need to be unquiesced to ensure all
requests are flushed to completion.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:07 +0000 (08:13 -0600)]
nvme-pci: remove HMB teardown on reset
The controller is required to disable its host memory buffer use on
controller reset. We don't need to submit an admin command to delete it,
so this patch skips sending that command so we don't need to worry about
handling a timeout.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:06 +0000 (08:13 -0600)]
nvme-pci: queue creation fixes
We've been ignoring NVMe error status on queue creations. Fortunately they
are uncommon, but we should handle these anyway. This patch adds checks
for the a positive error return value that indicates an NVMe status.
If we do see a negative return, the controller isn't usable, so this
patch returns immediately in since we can't unwind that failure.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:05 +0000 (08:13 -0600)]
nvme-pci: remove unnecessary completion doorbell check
The nvme pci driver never unmaps the doorbell registers while the requests
are active, so we can always safely update the completion queue head.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 6 Jun 2018 14:13:04 +0000 (08:13 -0600)]
nvme-pci: remove unnecessary nested locking
The nvme pci driver no longer handles completions under the cq lock,
so the nested locking is not necessary.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Wed, 6 Jun 2018 12:27:48 +0000 (15:27 +0300)]
nvmet: filter newlines from user input
We should avoid consuming the newlines in traddr, trsvcid and
device_path. Add minimal processing to make sure they are gone.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Steve Wise [Tue, 5 Jun 2018 17:16:41 +0000 (10:16 -0700)]
nvme-rdma: correctly check for target keyed sgl support
The code was checking bit 20 instead of bit 2. Also fixed the log entry.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Fri, 1 Jun 2018 07:11:20 +0000 (09:11 +0200)]
nvme: don't hold nvmf_transports_rwsem for more than transport lookups
Only take nvmf_transports_rwsem when doing a lookup of registered
transports, so that a blocking ->create_ctrl doesn't prevent other
actions on /dev/nvme-fabrics.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: increased lock hold time a bit to be safe, added a comment
and updated the changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 31 May 2018 16:23:48 +0000 (18:23 +0200)]
nvmet: return all zeroed buffer when we can't find an active namespace
Quote from Figure 106 in NVMe 1.3a:
The Identify Namespace data structure is returned to the host for the
namespace specified in the Namespace Identifier (CDW1.NSID) field if it
is an active NSID. If the specified namespace is not an active NSID,
then the controller returns a zero filled data structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 8 Jun 2018 18:10:58 +0000 (11:10 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the core arm64 and perf changes, the Spectre v4 mitigation
touches the arm KVM code and the ACPI PPTT support touches drivers/
(acpi and cacheinfo). I should have the maintainers' acks in place.
Summary:
- Spectre v4 mitigation (Speculative Store Bypass Disable) support
for arm64 using SMC firmware call to set a hardware chicken bit
- ACPI PPTT (Processor Properties Topology Table) parsing support and
enable the feature for arm64
- Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The
primary motivation is Scalable Vector Extensions which requires
more space on the signal frame than the currently defined
MINSIGSTKSZ
- ARM perf patches: allow building arm-cci as module, demote
dev_warn() to dev_dbg() in arm-ccn event_init(), miscellaneous
cleanups
- cmpwait() WFE optimisation to avoid some spurious wakeups
- L1_CACHE_BYTES reverted back to 64 (for performance reasons that
have to do with some network allocations) while keeping
ARCH_DMA_MINALIGN to 128. cache_line_size() returns the actual
hardware Cache Writeback Granule
- Turn LSE atomics on by default in Kconfig
- Kernel fault reporting tidying
- Some #include and miscellaneous cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (53 commits)
arm64: Fix syscall restarting around signal suppressed by tracer
arm64: topology: Avoid checking numa mask for scheduler MC selection
ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled
arm64: cpu_errata: include required headers
arm64: KVM: Move VCPU_WORKAROUND_2_FLAG macros to the top of the file
arm64: signal: Report signal frame size to userspace via auxv
arm64/sve: Thin out initialisation sanity-checks for sve_max_vl
arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
arm64: KVM: Add HYP per-cpu accessors
arm64: ssbd: Add prctl interface for per-thread mitigation
arm64: ssbd: Introduce thread flag to control userspace mitigation
arm64: ssbd: Restore mitigation status on CPU resume
arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
arm64: ssbd: Add global mitigation state accessor
arm64: Add 'ssbd' command-line option
arm64: Add ARCH_WORKAROUND_2 probing
arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
...
Linus Torvalds [Fri, 8 Jun 2018 18:02:21 +0000 (11:02 -0700)]
Merge tag 'dmaengine-4.18-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
- updates to sprd, bam_dma, stm drivers
- remove VLAs in dmatest
- move TI drivers to their own subdir
- switch to SPDX tags for ima/mxs dma drivers
- simplify getting .drvdata on bunch of drivers by Wolfram Sang
* tag 'dmaengine-4.18-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (32 commits)
dmaengine: sprd: Add Spreadtrum DMA configuration
dmaengine: sprd: Optimize the sprd_dma_prep_dma_memcpy()
dmaengine: imx-dma: Switch to SPDX identifier
dmaengine: mxs-dma: Switch to SPDX identifier
dmaengine: imx-sdma: Switch to SPDX identifier
dmaengine: usb-dmac: Document R8A7799{0,5} bindings
dmaengine: qcom: bam_dma: fix some doc warnings.
dmaengine: qcom: bam_dma: fix invalid assignment warning
dmaengine: sprd: fix an NULL vs IS_ERR() bug
dmaengine: sprd: Use devm_ioremap_resource() to map memory
dmaengine: sprd: Fix potential NULL dereference in sprd_dma_probe()
dmaengine: pl330: flush before wait, and add dev burst support.
dmaengine: axi-dmac: Request IRQ with IRQF_SHARED
dmaengine: stm32-mdma: fix spelling mistake: "avalaible" -> "available"
dmaengine: rcar-dmac: Document R-Car D3 bindings
dmaengine: sprd: Move DMA request mode and interrupt type into head file
dmaengine: sprd: Define the DMA data width type
dmaengine: sprd: Define the DMA transfer step type
dmaengine: ti: New directory for Texas Instruments DMA drivers
dmaengine: shdmac: Change platform check to CONFIG_ARCH_RENESAS
...
Linus Torvalds [Fri, 8 Jun 2018 17:44:33 +0000 (10:44 -0700)]
Merge tag 'iommu-updates-v4.18' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Nothing big this time. In particular:
- Debugging code for Tegra-GART
- Improvement in Intel VT-d fault printing to prevent soft-lockups
when on fault storms
- Improvements in AMD IOMMU event reporting
- NUMA aware allocation in io-pgtable code for ARM
- Various other small fixes and cleanups all over the place"
* tag 'iommu-updates-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/io-pgtable-arm: Make allocations NUMA-aware
iommu/amd: Prevent possible null pointer dereference and infinite loop
iommu/amd: Fix grammar of comments
iommu: Clean up the comments for iommu_group_alloc
iommu/vt-d: Remove unnecessary parentheses
iommu/vt-d: Clean up pasid quirk for pre-production devices
iommu/vt-d: Clean up unused variable in find_or_alloc_domain
iommu/vt-d: Fix iotlb psi missing for mappings
iommu/vt-d: Introduce __mapping_notify_one()
iommu: Remove extra NULL check when call strtobool()
iommu/amd: Update logging information for new event type
iommu/amd: Update the PASID information printed to the system log
iommu/tegra: gart: Fix gart_iommu_unmap()
iommu/tegra: gart: Add debugging facility
iommu/io-pgtable-arm: Use for_each_set_bit to simplify code
iommu/qcom: Simplify getting .drvdata
iommu: Remove depends on HAS_DMA in case of platform dependency
iommu/vt-d: Ratelimit each dmar fault printing
Linus Torvalds [Fri, 8 Jun 2018 17:39:20 +0000 (10:39 -0700)]
Merge tag 'mtd/for-4.18' of git://git.infradead.org/linux-mtd
Pull MTD updates from Boris Brezillon:
"Core changes:
- Add a sysfs attribute to expose available OOB size
Driver changes:
- Remove HAS_DMA dependency on various drivers
- Use dev_get_drvdata() instead of platform_get_drvdata() in docg3
- Replace msleep by usleep_range() in the dataflash driver
- Avoid VLA usage in nftl layers
- Remove useless .owner assignment in pismo
- Fix various issues in the CFI driver
- Improve TRX partition handling expose a DT compat for this part
parser
- Clarify OFFSET_CONTINUOUS meaning
NAND core changes:
- Add Miquel as a NAND maintainer
- Add access mode to the nand_page_io_req struct
- Fix kernel-doc in rawnand.h
- Support bit-wise majority to recover from corrupted ONFI parameter
pages
- Stop checking FAIL bit after a SET_FEATURES, as documented in the
ONFI spec
Raw NAND Driver changes:
- Fix and cleanup the error path of many NAND controller drivers
- GPMI:
+ Cleanup/simplification of a few aspects in the driver
+ Take ECC setup specified in the DT into account
- sunxi: remove support for GPIO-based R/B polling
- MTK:
+ Use of_device_get_match_data() instead of of_match_device()
+ Add an entry in MAINTAINERS for this driver
+ Fix nand-ecc-step-size and nand-ecc-strength description in the
DT bindings doc
- fsl_ifc: fix ->cmdfunc() to read more than one ONFI parameter page
OneNAND driver changes:
- samsung: use dev_get_drvdata() instead of platform_get_drvdata()
SPI NOR core changes:
- Add support for a bunch of SPI NOR chips
- Clear EAR reg when switching to 3-byte addressing mode on Winbond
chips
SPI NOR controller driver changes:
- cadence: Add DMA support for direct mode reads
- hisi: Prefix a few functions with hisi_
- intel:
+ Mark the driver as "dangerous" in Kconfig
+ Fix atomic sequence handling
+ Pass a 40us delay (instead of 0us) to readl_poll_timeout()
- fsl:
+ fix a typo in a function name
+ add support for IP variants embedded in the ls2080a and ls1080a
SoCs
- stm32: request exclusive control of the reset line"
* tag 'mtd/for-4.18' of git://git.infradead.org/linux-mtd: (66 commits)
mtd: nand: Pass mode information to nand_page_io_req
mtd: cfi_cmdset_0002: Change erase one block to enable XIP once
mtd: cfi_cmdset_0002: Change erase functions to check chip good only
mtd: cfi_cmdset_0002: Change erase functions to retry for error
mtd: cfi_cmdset_0002: Change definition naming to retry write operation
mtd: cfi_cmdset_0002: Change write buffer to check correct value
mtd: cmdlinepart: Update comment for introduction of OFFSET_CONTINUOUS
mtd: bcm47xxpart: add of_match_table with a new DT binding
dt-bindings: mtd: document Broadcom's BCM47xx partitions
mtd: spi-nor: Add support for EN25QH32
mtd: spi-nor: Add support for is25wp series chips
mtd: spi-nor: Add Winbond w25q32jv support
mtd: spi-nor: fsl-quadspi: add support for ls2080a/ls1080a
mtd: spi-nor: stm32-quadspi: explicitly request exclusive reset control
mtd: spi-nor: intel: provide a range for poll_timout
mtd: spi-nor: fsl-quadspi: fix api naming typo _init_ahb_read
mtd: spi-nor: intel-spi: Explicitly mark the driver as dangerous in Kconfig
mtd: spi-nor: intel-spi: Fix atomic sequence handling
mtd: rawnand: Do not check FAIL bit when executing a SET_FEATURES op
mtd: rawnand: use bit-wise majority to recover the ONFI param page
...
Linus Torvalds [Fri, 8 Jun 2018 17:31:52 +0000 (10:31 -0700)]
Merge tag 'gpio-v4.18-1' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.18 development cycle.
Core changes:
- We have killed off VLA from the core library and all drivers.
The background should be clear for everyone at this point:
https://lwn.net/Articles/749064/
Also I just don't like VLA's, kernel developers hate it when
compilers do things behind their back. It's as simple as that.
I'm sorry that they even slipped in to begin with. Kudos to Laura
Abbott for exorcising them.
- Support GPIO hogs in machines/board files.
New drivers and chip support:
- R-Car r8a77470 (RZ/G1C)
- R-Car r8a77965 (M3-N)
- R-Car r8a77990 (E3)
- PCA953x driver improvements to accomodate more variants.
Improvements and new features:
- Support one interrupt per line on port A in the DesignWare dwapb
driver.
Misc:
- Random cleanups, right header files in the drivers, some size
optimizations etc"
* tag 'gpio-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (73 commits)
gpio: davinci: fix build warning when !CONFIG_OF
gpio: dwapb: Fix rework support for 1 interrupt per port A GPIO
gpio: pxa: Include the right header
gpio: pl061: Include the right header
gpio: pch: Include the right header
gpio: pcf857x: Include the right header
gpio: pca953x: Include the right header
gpio: palmas: Include the right header
gpio: omap: Include the right header
gpio: octeon: Include the right header
gpio: mxs: Switch to SPDX identifier
gpio: Remove VLA from stmpe driver
gpio: mxc: Switch to SPDX identifier
gpio: mxc: add clock operation
gpio: Remove VLA from gpiolib
gpio: aspeed: Use a cache of output data registers
gpio: aspeed: Set output latch before changing direction
gpio: pca953x: fix address calculation for pcal6524
gpio: pca953x: define masks for addressing common and extended registers
gpio: pca953x: set the PCA_PCAL flag also when matching by DT
...
Linus Torvalds [Fri, 8 Jun 2018 17:29:26 +0000 (10:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- Valve Steam Controller support from Rodrigo Rivas Costa
- Redragon Asura support from Robert Munteanu
- improvement of duplicate usage handling in generic hid-input from
Benjamin Tissoires
- Win 8.1 precisioun touchpad spec implementation from Benjamin
Tissoires
- Support for "In Range" flag for Wacom Intuos/Bamboo devices from
Jason Gerecke
- other various assorted smaller fixes and improvements
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (27 commits)
HID: rmi: use HID_QUIRK_NO_INPUT_SYNC
HID: multitouch: fix calculation of last slot field in multi-touch reports
HID: quirks: remove Delcom Visual Signal Indicator from hid_have_special_driver[]
HID: steam: select CONFIG_POWER_SUPPLY
HID: i2c-hid: remove i2c_hid_open_mut
HID: wacom: Support "in range" for Intuos/Bamboo tablets where possible
HID: core: fix hid_hw_open() comment
HID: hid-plantronics: Re-resend Update to map button for PTT products
HID: multitouch: fix types returned from mt_need_to_apply_feature()
HID: i2c-hid: check if device is there before really probing
HID: steam: add missing fields in client initialization
HID: steam: add battery device.
HID: add driver for Valve Steam Controller
HID: alps: Fix some style in 't4_read_write_register()'
HID: alps: Check errors returned by 't4_read_write_register()'
HID: alps: Save a memory allocation in 't4_read_write_register()' when writing data
HID: alps: Report an error if we receive invalid data in 't4_read_write_register()'
HID: multitouch: implement precision touchpad latency and switches
HID: multitouch: simplify the settings of the various features
HID: multitouch: make use of HID_QUIRK_INPUT_PER_APP
...
Linus Torvalds [Fri, 8 Jun 2018 17:27:41 +0000 (10:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching fixlet from Jiri Kosina:
"livepatching documentation fix from Petr Mladek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Remove not longer valid limitations from the documentation
Linus Torvalds [Fri, 8 Jun 2018 17:00:20 +0000 (10:00 -0700)]
Merge branch 'work.aio' of git://git./linux/kernel/git/viro/vfs
Pull aio iopriority support from Al Viro:
"The rest of aio stuff for this cycle - Adam's aio ioprio series"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: aio ioprio use ioprio_check_cap ret val
fs: aio ioprio add explicit block layer dependence
fs: iomap dio set bio prio from kiocb prio
fs: blkdev set bio prio from kiocb prio
fs: Add aio iopriority support
fs: Convert kiocb rw_hint from enum to u16
block: add ioprio_check_cap function
Linus Torvalds [Fri, 8 Jun 2018 16:56:38 +0000 (09:56 -0700)]
Merge branch 'work.lookup' of git://git./linux/kernel/git/viro/vfs
Pull proc_fill_cache regression fix from Al Viro:
"Regression fix for proc_fill_cache() braino introduced when switching
instantiate() callback to d_splice_alias()"
* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix proc_fill_cache() in case of d_alloc_parallel() failure
Linus Torvalds [Fri, 8 Jun 2018 16:24:54 +0000 (09:24 -0700)]
Merge tag 'for-linus-4.18-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"This contains some minor code cleanups (fixing return types of
functions), some fixes for Linux running as Xen PVH guest, and adding
of a new guest resource mapping feature for Xen tools"
* tag 'for-linus-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/PVH: Make GDT selectors PVH-specific
xen/PVH: Set up GS segment for stack canary
xen/store: do not store local values in xen_start_info
xen-netfront: fix xennet_start_xmit()'s return type
xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
xen: Change return type to vm_fault_t
Mark Brown [Fri, 8 Jun 2018 15:27:56 +0000 (16:27 +0100)]
Merge branch 'regulator-4.17' into regulator-4.18 merge window
Kent Overstreet [Fri, 8 Jun 2018 00:52:54 +0000 (20:52 -0400)]
md: Unify mddev destruction paths
Previously, mddev_put() had a couple different paths for freeing a
mddev, due to the fact that the kobject wasn't initialized when the
mddev was first allocated. If we move the kobject_init() to when it's
first allocated and just use kobject_add() later, we can clean all this
up.
This also removes a hack in mddev_put() to avoid freeing biosets under a
spinlock, which involved copying biosets on the stack after the reset
bioset_init() changes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 7 Jun 2018 20:42:06 +0000 (14:42 -0600)]
dm: use bioset_init_from_src() to copy bio_set
We can't just copy and clear a bio_set, use the bio helper to
setup a new bio_set with the settings from another one.
Fixes: 6f1c819c219f ("dm: convert to bioset_init()/mempool_init()")
Reported-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 7 Jun 2018 20:42:05 +0000 (14:42 -0600)]
block: add bioset_init_from_src() helper
Add a helper that allows a caller to initialize a new bio_set,
using the settings from an existing bio_set.
Reported-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Martin [Thu, 7 Jun 2018 11:32:05 +0000 (12:32 +0100)]
arm64: Fix syscall restarting around signal suppressed by tracer
Commit
17c2895 ("arm64: Abstract syscallno manipulation") abstracts
out the pt_regs.syscallno value for a syscall cancelled by a tracer
as NO_SYSCALL, and provides helpers to set and check for this
condition. However, the way this was implemented has the
unintended side-effect of disabling part of the syscall restart
logic.
This comes about because the second in_syscall() check in
do_signal() re-evaluates the "in a syscall" condition based on the
updated pt_regs instead of the original pt_regs. forget_syscall()
is explicitly called prior to the second check in order to prevent
restart logic in the ret_to_user path being spuriously triggered,
which means that the second in_syscall() check always yields false.
This triggers a failure in
tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
suppress a signal that interrups a nanosleep() syscall.
Misbehaviour of this type is only expected in the case where a
tracer suppresses a signal and the target process is either being
single-stepped or the interrupted syscall attempts to restart via
-ERESTARTBLOCK.
This patch restores the old behaviour by performing the
in_syscall() check only once at the start of the function.
Fixes: 17c289586009 ("arm64: Abstract syscallno manipulation")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reported-by: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jiri Kosina [Fri, 8 Jun 2018 08:28:24 +0000 (10:28 +0200)]
Merge branch 'for-4.18/wacom' into for-linus
Support for "In Range" flag for Wacom Intuos/Bamboo devices from Jason Gerecke
Jiri Kosina [Fri, 8 Jun 2018 08:27:40 +0000 (10:27 +0200)]
Merge branch 'for-4.18/upstream' into for-linus
Jiri Kosina [Fri, 8 Jun 2018 08:27:02 +0000 (10:27 +0200)]
Merge branch 'for-4.18/rmi' into for-linus
RMI4 correct split report handling from Benjamin Tissoires
Jiri Kosina [Fri, 8 Jun 2018 08:26:18 +0000 (10:26 +0200)]
Merge branch 'for-4.18/plantronics' into for-linus
Jiri Kosina [Fri, 8 Jun 2018 08:25:50 +0000 (10:25 +0200)]
Merge branch 'for-4.18/multitouch' into for-linus
- improvement of duplicate usage handling in hid-input from Benjamin Tissoires
- Win 8.1 precisioun touchpad spec implementation from Benjamin Tissoires
Jiri Kosina [Fri, 8 Jun 2018 08:23:34 +0000 (10:23 +0200)]
Merge branch 'for-4.18/i2c-hid' into for-linus
Assorted smaller fixes to i2c-hid driver
Jiri Kosina [Fri, 8 Jun 2018 08:22:26 +0000 (10:22 +0200)]
Merge branch 'for-4.18/hid-steam' into for-linus
Valve Steam Controller support from Rodrigo Rivas Costa
Jiri Kosina [Fri, 8 Jun 2018 08:21:47 +0000 (10:21 +0200)]
Merge branch 'for-4.18/hid-redragon' into for-linus
Redragon Asura support from Robert Munteanu
Jiri Kosina [Fri, 8 Jun 2018 08:20:42 +0000 (10:20 +0200)]
Merge branch 'for-4.18/alps' into for-linus
hid-alps driver cleanups wrt. t4_read_write_register() handling
from Christophe Jaillet
Al Viro [Fri, 8 Jun 2018 05:17:11 +0000 (01:17 -0400)]
fix proc_fill_cache() in case of d_alloc_parallel() failure
If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput()
that. Small reorganization allows to have all error-in-lookup
cases rejoin the main codepath after dput(child), avoiding the
entire problem.
Spotted-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: 0168b9e38c42 "procfs: switch instantiate_t to d_splice_alias()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 8 Jun 2018 01:39:37 +0000 (18:39 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- v9fs updates
- MM
- procfs updates
- lib/ updates
- autofs updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
autofs: small cleanup in autofs_getpath()
autofs: clean up includes
autofs: comment on selinux changes needed for module autoload
autofs: update MAINTAINERS entry for autofs
autofs: use autofs instead of autofs4 in documentation
autofs: rename autofs documentation files
autofs: create autofs Kconfig and Makefile
autofs: delete fs/autofs4 source files
autofs: update fs/autofs4/Makefile
autofs: update fs/autofs4/Kconfig
autofs: copy autofs4 to autofs
autofs4: use autofs instead of autofs4 everywhere
autofs4: merge auto_fs.h and auto_fs4.h
fs/binfmt_misc.c: do not allow offset overflow
checkpatch: improve patch recognition
lib/ucs2_string.c: add MODULE_LICENSE()
lib/mpi: headers cleanup
lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock
lib/idr.c: remove simple_ida_lock
lib/bitmap.c: micro-optimization for __bitmap_complement()
...
Dan Carpenter [Fri, 8 Jun 2018 00:11:52 +0000 (17:11 -0700)]
autofs: small cleanup in autofs_getpath()
We don't set "*name" so it's slightly nicer to just pass "name" instead
of "&name".
Link: http://lkml.kernel.org/r/20180531064736.lnisb55eajwjynvk@kili.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:48 +0000 (17:11 -0700)]
autofs: clean up includes
Remove includes that aren't needed from autofs (and fs/compat_ioctl.c).
Link: http://lkml.kernel.org/r/152635085258.5968.9743527195522188148.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:45 +0000 (17:11 -0700)]
autofs: comment on selinux changes needed for module autoload
Due to the autofs4 module using a file system type name of autofs
different from the module containing directory name autoload did not
function properly. To work around this kernel configurations have often
elected to build the module into the kernel.
This can result in selinux policies that prohibit autoloading of the
autofs module which need to be changed.
Add a comment about this to "possible changes" section of the autofs4
module help.
Link: http://lkml.kernel.org/r/152686474171.6155.1239659539983577463.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:42 +0000 (17:11 -0700)]
autofs: update MAINTAINERS entry for autofs
Update the autofs entry in MAINTAINERS to reflect the rename of autofs4
to autofs.
Link: http://lkml.kernel.org/r/152626709611.28589.456596640024354223.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:38 +0000 (17:11 -0700)]
autofs: use autofs instead of autofs4 in documentation
Finally remove autofs4 references in the filesystems documentation.
Link: http://lkml.kernel.org/r/152626709055.28589.416082809460051475.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:35 +0000 (17:11 -0700)]
autofs: rename autofs documentation files
There are two files in Documentation/filsystems that should now use
autofs rather than autofs4 in their names.
Link: http://lkml.kernel.org/r/152626707957.28589.3325300375892913999.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:31 +0000 (17:11 -0700)]
autofs: create autofs Kconfig and Makefile
Create Makefile and Kconfig for autofs module.
[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626705591.28589.356365986974038383.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:26 +0000 (17:11 -0700)]
autofs: delete fs/autofs4 source files
Delete the now unused autofs4 module files.
Link: http://lkml.kernel.org/r/152626707391.28589.3553309771262313504.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:22 +0000 (17:11 -0700)]
autofs: update fs/autofs4/Makefile
Update Makefile to build from source in fs/autofs instead of fs/autofs4.
Link: http://lkml.kernel.org/r/152626706824.28589.1915028175544560855.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:17 +0000 (17:11 -0700)]
autofs: update fs/autofs4/Kconfig
Update Kconfig and add a depricated warning.
[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626706133.28589.11994171621899212952.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:13 +0000 (17:11 -0700)]
autofs: copy autofs4 to autofs
Copy source files from the autofs4 directory to the autofs directory.
Link: http://lkml.kernel.org/r/152626705013.28589.931913083997578251.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:09 +0000 (17:11 -0700)]
autofs4: use autofs instead of autofs4 everywhere
Update naming within autofs source to be consistent by changing
occurrences of autofs4 to autofs.
Link: http://lkml.kernel.org/r/152626703688.28589.8315406711135226803.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Fri, 8 Jun 2018 00:11:05 +0000 (17:11 -0700)]
autofs4: merge auto_fs.h and auto_fs4.h
The autofs module has long since been removed so there's no need to have
two separate include files for autofs.
Link: http://lkml.kernel.org/r/152626703024.28589.9571964661718767929.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thadeu Lima de Souza Cascardo [Fri, 8 Jun 2018 00:11:01 +0000 (17:11 -0700)]
fs/binfmt_misc.c: do not allow offset overflow
WHen registering a new binfmt_misc handler, it is possible to overflow
the offset to get a negative value, which might crash the system, or
possibly leak kernel data.
Here is a crash log when
2500000000 was used as an offset:
BUG: unable to handle kernel paging request at
ffff989cfd6edca0
IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
PGD
1ef3e067 P4D
1ef3e067 PUD 0
Oops: 0000 [#1] SMP NOPTI
Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
Call Trace:
search_binary_handler+0x97/0x1d0
do_execveat_common.isra.34+0x667/0x810
SyS_execve+0x31/0x40
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Use kstrtoint instead of simple_strtoul. It will work as the code
already set the delimiter byte to '\0' and we only do it when the field
is not empty.
Tested with offsets -1,
2500000000, UINT_MAX and INT_MAX. Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst
and other registrations from packages on Ubuntu.
Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 8 Jun 2018 00:10:58 +0000 (17:10 -0700)]
checkpatch: improve patch recognition
There are mode change and rename only patches that are unrecognized by
checkpatch.
Recognize them.
[joe@perches.com: fix missing close parenthesis]
Link: http://lkml.kernel.org/r/af44c893f6973393f2a5b11f1a8e5cd4c8bbbba5.camel@perches.com
Link: http://lkml.kernel.org/r/974a407e6fa18abd5a965da39cc68986a4c4f091.1526949367.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 8 Jun 2018 00:10:55 +0000 (17:10 -0700)]
lib/ucs2_string.c: add MODULE_LICENSE()
Fix missing MODULE_LICENSE() warning in lib/ucs2_string.c:
WARNING: modpost: missing MODULE_LICENSE() in lib/ucs2_string.o
see include/linux/module.h for more information
Link: http://lkml.kernel.org/r/b2505bb4-dcf5-fc46-443d-e47db1cb2f59@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasily Averin [Fri, 8 Jun 2018 00:10:51 +0000 (17:10 -0700)]
lib/mpi: headers cleanup
MPI headers contain definitions for huge number of non-existing
functions.
Most part of these functions was removed in 2012 by Dmitry Kasatkin
-
7cf4206a99d1 ("Remove unused code from MPI library")
-
9e235dcaf4f6 ("Revert "crypto: GnuPG based MPI lib - additional ...")
-
bc95eeadf5c6 ("lib/mpi: removed unused functions")
however headers wwere not updated properly.
Also I deleted some unused macros.
Link: http://lkml.kernel.org/r/fb2fc1ef-1185-f0a3-d8d0-173d2f97bbaf@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian Andrzej Siewior [Fri, 8 Jun 2018 00:10:48 +0000 (17:10 -0700)]
lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock
percpu_ida() decouples disabling interrupts from the locking operations.
This breaks some assumptions if the locking operations are replaced like
they are under -RT.
The same locking can be achieved by avoiding local_irq_save() and using
spin_lock_irqsave() instead. percpu_ida_alloc() gains one more preemption
point because after unlocking the fastpath and before the pool lock is
acquired, the interrupts are briefly enabled.
Link: http://lkml.kernel.org/r/20180504153218.7301-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:10:45 +0000 (17:10 -0700)]
lib/idr.c: remove simple_ida_lock
Improve the scalability of the IDA by using the per-IDA xa_lock rather
than the global simple_ida_lock. IDAs are not typically used in
performance-sensitive locations, but since we have this lock anyway, we
can use it. It is also a step towards converting the IDA from the radix
tree to the XArray.
[akpm@linux-foundation.org: idr.c needs xarray.h]
Link: http://lkml.kernel.org/r/20180331125332.GF13332@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yury Norov [Fri, 8 Jun 2018 00:10:41 +0000 (17:10 -0700)]
lib/bitmap.c: micro-optimization for __bitmap_complement()
Use BITS_TO_LONGS() macro to avoid calculation of reminder (bits %
BITS_PER_LONG) On ARM64 it saves 5 instruction for function - 16 before
and 11 after.
Link: http://lkml.kernel.org/r/20180411145914.6011-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 8 Jun 2018 00:10:38 +0000 (17:10 -0700)]
get_maintainer: improve patch recognition
There are mode change and rename only patches that are unrecognized
by the get_maintainer.pl script.
Recognize them.
Link: http://lkml.kernel.org/r/bf63101a908d0ff51948164aa60e672368066186.1526949367.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 8 Jun 2018 00:10:34 +0000 (17:10 -0700)]
kernel/hung_task.c: show all hung tasks before panic
When we get a hung task it can often be valuable to see _all_ the hung
tasks on the system before calling panic().
Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=
5316056503549952
----------------------------------------
INFO: task syz-executor0:6540 blocked for more than 120 seconds.
Not tainted 4.16.0+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor0 D23560 6540 4521 0x80000004
Call Trace:
context_switch kernel/sched/core.c:2848 [inline]
__schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490
schedule+0xf5/0x430 kernel/sched/core.c:3549
schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060
isofs_get_last_session fs/isofs/inode.c:567 [inline]
isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660
mount_bdev+0x2b7/0x370 fs/super.c:1119
isofs_mount+0x34/0x40 fs/isofs/inode.c:1560
mount_fs+0x66/0x2d0 fs/super.c:1222
vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:2514 [inline]
do_new_mount fs/namespace.c:2517 [inline]
do_mount+0xea4/0x2b90 fs/namespace.c:2847
ksys_mount+0xab/0x120 fs/namespace.c:3063
SYSC_mount fs/namespace.c:3077 [inline]
SyS_mount+0x39/0x50 fs/namespace.c:3074
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
(...snipped...)
Showing all locks held in the system:
(...snipped...)
2 locks held by syz-executor0/6540:
#0:
00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline]
#0:
00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */
#1:
0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
(...snipped...)
3 locks held by syz-executor7/6541:
#0:
0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
#1:
000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
#2:
00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */
----------------------------------------
When reporting an AB-BA deadlock like shown above, it would be nice if
trace of PID=6541 is printed as well as trace of PID=6540 before calling
panic().
Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay
calling panic() but normally there should not be so many hung tasks.
Link: http://lkml.kernel.org/r/201804050705.BHE57833.HVFOFtSOMQJFOL@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 8 Jun 2018 00:10:30 +0000 (17:10 -0700)]
include/linux/types.h: use fixed width types without double-underscore prefix
This header file is not exported. It is safe to reference types without
double-underscore prefix.
Link: http://lkml.kernel.org/r/1526350925-14922-3-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Lihao Liang <lianglihao@huawei.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 8 Jun 2018 00:10:27 +0000 (17:10 -0700)]
include/linux/types.h: define aligned_ types based on uapi header
<uapi/linux/types.h> has the same typedefs except that it prefixes them
with double-underscore for user space. Use them for the kernel space
typedefs.
Link: http://lkml.kernel.org/r/1526350925-14922-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Lihao Liang <lianglihao@huawei.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 8 Jun 2018 00:10:24 +0000 (17:10 -0700)]
int-ll64.h: define u{8,16,32,64} and s{8,16,32,64} based on uapi header
<uapi/asm-generic/int-ll64.h> has the same typedefs except that it
prefixes them with double-underscore for user space. Use them for
the kernel space typedefs.
Link: http://lkml.kernel.org/r/1526350925-14922-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Lihao Liang <lianglihao@huawei.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:20 +0000 (17:10 -0700)]
tools/testing/selftests/proc: test /proc/*/fd a bit (+ PF_KTHREAD is ABI!)
* Test lookup in /proc/self/fd.
"map_files" lookup story showed that lookup is not that simple.
* Test that all those symlinks open the same file.
Check with (st_dev, st_info).
* Test that kernel threads do not have anything in their /proc/*/fd/
directory.
Now this is where things get interesting.
First, kernel threads aren't pinned by /proc/self or equivalent,
thus some "atomicity" is required.
Second, ->comm can contain whitespace and ')'.
No, they are not escaped.
Third, the only reliable way to check if process is kernel thread
appears to be field #9 in /proc/*/stat.
This field is struct task_struct::flags in decimal!
Check is done by testing PF_KTHREAD flags like we do in kernel.
PF_KTREAD value is a part of userspace ABI !!!
Other methods for determining kernel threadness are not reliable:
* RSS can be 0 if everything is swapped, even while reading
from /proc/self.
* ->total_vm CAN BE ZERO if process is finishing
munmap(NULL, whole address space);
* /proc/*/maps and similar files can be empty because unmapping
everything works. Read returning 0 can't distinguish between
kernel thread and such suicide process.
Link: http://lkml.kernel.org/r/20180505000414.GA15090@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:17 +0000 (17:10 -0700)]
proc: use "unsigned int" for /proc/*/stack
struct stack_trace::nr_entries is defined as "unsigned int" (YAY!) so
the iterator should be unsigned as well.
It saves 1 byte of code or something like that.
Link: http://lkml.kernel.org/r/20180423215248.GG9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:13 +0000 (17:10 -0700)]
proc: use "unsigned int" for sigqueue length
It's defined as atomic_t and really long signal queues are unheard of.
Link: http://lkml.kernel.org/r/20180423215119.GF9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:10 +0000 (17:10 -0700)]
proc: use "unsigned int" in proc_fill_cache()
All those lengths are unsigned as they should be.
Link: http://lkml.kernel.org/r/20180423213751.GC9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:07 +0000 (17:10 -0700)]
proc: smaller RCU section in ->getattr()
struct kstat is thread local.
Link: http://lkml.kernel.org/r/20180423213626.GB9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:10:02 +0000 (17:10 -0700)]
proc: deduplicate /proc/*/cmdline implementation
Code can be sonsolidated if a dummy region of 0 length is used in normal
case of \0-separated command line:
1) [arg_start, arg_end) + [dummy len=0]
2) [arg_start, arg_end) + [env_start, env_end)
Link: http://lkml.kernel.org/r/20180221193335.GB28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:09:59 +0000 (17:09 -0700)]
proc: simpler iterations for /proc/*/cmdline
"rv" variable is used both as a counter of bytes transferred and an
error value holder but it can be reduced solely to error values if
original start of userspace buffer is stashed and used at the very end.
[akpm@linux-foundation.org: simplify cleanup code]
Link: http://lkml.kernel.org/r/20180221193009.GA28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:09:55 +0000 (17:09 -0700)]
proc: somewhat simpler code for /proc/*/cmdline
"final" variable is OK but we can get away with less lines.
Link: http://lkml.kernel.org/r/20180221192751.GC28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 8 Jun 2018 00:09:52 +0000 (17:09 -0700)]
proc: more "unsigned int" in /proc/*/cmdline
access_remote_vm() doesn't return negative errors, it returns number of
bytes read/written (0 if error occurs). This allows to delete some
comparisons which never trigger.
Reuse "nr_read" variable while I'm at it.
Link: http://lkml.kernel.org/r/20180221192605.GB28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sahara [Fri, 8 Jun 2018 00:09:48 +0000 (17:09 -0700)]
mm: remove page_is_poisoned() from linux/mm.h
When commit
bd33ef368135 ("mm: enable page poisoning early at boot") got
rid of the PAGE_EXT_DEBUG_POISON, page_is_poisoned in the header left
behind. This patch cleans up the leftovers under the table.
Link: http://lkml.kernel.org/r/1528101069-21637-1-git-send-email-kpark3469@gmail.com
Signed-off-by: Sahara <keun-o.park@darkmatter.ae>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Lu [Fri, 8 Jun 2018 00:09:44 +0000 (17:09 -0700)]
mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the same cacheline
The LKP robot found a 27% will-it-scale/page_fault3 performance
regression regarding commit
e27be240df53("mm: memcg: make sure
memory.events is uptodate when waking pollers").
What the test does is:
1 mkstemp() a 128M file on a tmpfs;
2 start $nr_cpu processes, each to loop the following:
2.1 mmap() this file in shared write mode;
2.2 write 0 to this file in a PAGE_SIZE step till the end of the file;
2.3 unmap() this file and repeat this process.
3 After 5 minutes, check how many loops they managed to complete, the
higher the better.
The commit itself looks innocent enough as it merely changed some event
counting mechanism and this test didn't trigger those events at all.
Perf shows increased cycles spent on accessing root_mem_cgroup->stat_cpu
in count_memcg_event_mm()(called by handle_mm_fault()) and in
__mod_memcg_state() called by page_add_file_rmap(). So it's likely due
to the changed layout of 'struct mem_cgroup' that either make stat_cpu
falling into a constantly modifying cacheline or some hot fields stop
being in the same cacheline.
I verified this by moving memory_events[] back to where it was:
: --- a/include/linux/memcontrol.h
: +++ b/include/linux/memcontrol.h
: @@ -205,7 +205,6 @@ struct mem_cgroup {
: int oom_kill_disable;
:
: /* memory.events */
: - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
: struct cgroup_file events_file;
:
: /* protect arrays of thresholds */
: @@ -238,6 +237,7 @@ struct mem_cgroup {
: struct mem_cgroup_stat_cpu __percpu *stat_cpu;
: atomic_long_t stat[MEMCG_NR_STAT];
: atomic_long_t events[NR_VM_EVENT_ITEMS];
: + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
:
: unsigned long socket_pressure;
And performance restored.
Later investigation found that as long as the following 3 fields
moving_account, move_lock_task and stat_cpu are in the same cacheline,
performance will be good. To avoid future performance surprise by other
commits changing the layout of 'struct mem_cgroup', this patch makes
sure the 3 fields stay in the same cacheline.
One concern of this approach is, moving_account and move_lock_task could
be modified when a process changes memory cgroup while stat_cpu is a
always read field, it might hurt to place them in the same cacheline. I
assume it is rare for a process to change memory cgroup so this should
be OK.
Link: https://lkml.kernel.org/r/20180528114019.GF9904@yexl-desktop
Link: http://lkml.kernel.org/r/20180601071115.GA27302@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 8 Jun 2018 00:09:40 +0000 (17:09 -0700)]
mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags
kvmalloc warned about incompatible gfp_mask to catch abusers (mostly
GFP_NOFS) with an intention that this will motivate authors of the code
to fix those. Linus argues that this just motivates people to do even
more hacks like
if (gfp == GFP_KERNEL)
kvmalloc
else
kmalloc
I haven't seen this happening much (Linus pointed to bucket_lock special
cases an atomic allocation but my git foo hasn't found much more) but it
is true that we can grow those in future. Therefore Linus suggested to
simply not fallback to vmalloc for incompatible gfp flags and rather
stick with the kmalloc path.
Link: http://lkml.kernel.org/r/20180601115329.27807-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huaisheng Ye [Fri, 8 Jun 2018 00:09:36 +0000 (17:09 -0700)]
include/linux/gfp.h: fix the annotation of GFP_ZONE_TABLE
When bit is equal to 0x4, it means OPT_ZONE_DMA32 should be got from
GFP_ZONE_TABLE. OPT_ZONE_DMA32 shall be equal to ZONE_DMA32 or
ZONE_NORMAL according to the status of CONFIG_ZONE_DMA32.
Similarly, when bit is equal to 0xc, that means OPT_ZONE_DMA32 should be
got with an allocation policy GFP_MOVABLE. So ZONE_DMA32 or ZONE_NORMAL
is the possible result value.
Link: http://lkml.kernel.org/r/20180601163403.1032-1-yehs2007@zoho.com
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 8 Jun 2018 00:09:32 +0000 (17:09 -0700)]
mm/shmem.c: zero out unused vma fields in shmem_pseudo_vma_init()
shmem/tmpfs uses pseudo vma to allocate page with correct NUMA policy.
The pseudo vma doesn't have vm_page_prot set. We are going to encode
encryption KeyID in vm_page_prot. Having garbage there causes problems.
Zero out all unused fields in the pseudo vma.
Link: http://lkml.kernel.org/r/20180531135602.20321-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Fri, 8 Jun 2018 00:09:29 +0000 (17:09 -0700)]
mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE
allocations that want to allocate on a different node, e.g. because the
allocating thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page
is put on wrong node's list, then further list manipulations may corrupt
the list because page_to_nid() is used to determine which node's
list_lock should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There
is actually no reason to reset it, because memory policies and cpusets
don't affect the zonelist choice in the first place. This was different
when commit
183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's
anything currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that don't
yet have
511e3a058812 ("mm/slab: make cache_grow() handle the page
allocated on arbitrary node") it is. That's at least 4.4 LTS. Older
ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 8 Jun 2018 00:09:25 +0000 (17:09 -0700)]
userfaultfd: prevent non-cooperative events vs mcopy_atomic races
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of
the data in mcopy_atomic may happen either before of after the memory
mapping modifications and there is no way for the uffd monitor to
maintain consistent view of the process memory layout.
For instance, let's consider fork() running in parallel with
userfaultfd_copy():
process | uffd monitor
---------------------------------+------------------------------
fork() | userfaultfd_copy()
... | ...
dup_mmap() | down_read(mmap_sem)
down_write(mmap_sem) | /* create PTEs, copy data */
dup_uffd() | up_read(mmap_sem)
copy_page_range() |
up_write(mmap_sem) |
dup_uffd_complete() |
/* notify monitor */ |
If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
be present by the time copy_page_range() is called and they will appear
in the child's memory mappings. However, if the fork() is the first to
take the mmap_sem, the new pages won't be mapped in the child's address
space.
If the pages are not present and child tries to access them, the monitor
will get page fault notification and everything is fine. However, if
the pages *are present*, the child can access them without uffd
noticing. And if we copy them into child it'll see the wrong data.
Since we are talking about background copy, we'd need to decide whether
the pages should be copied or not regardless #PF notifications.
Since userfaultfd monitor has no way to determine what was the order,
let's disallow userfaultfd_copy in parallel with the non-cooperative
events. In such case we return -EAGAIN and the uffd monitor can
understand that userfaultfd_copy() clashed with a non-cooperative event
and take an appropriate action.
Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 8 Jun 2018 00:09:21 +0000 (17:09 -0700)]
mm: memcg: allow lowering memory.swap.max below the current usage
Currently an attempt to set swap.max into a value lower than the actual
swap usage fails, which causes configuration problems as there's no way
of lowering the configuration below the current usage short of turning
off swap entirely. This makes swap.max difficult to use and allows
delegatees to lock the delegator out of reducing swap allocation.
This patch updates swap_max_write() so that the limit can be lowered
below the current usage. It doesn't implement active reclaiming of swap
entries for the following reasons.
* mem_cgroup_swap_full() already tells the swap machinary to
aggressively reclaim swap entries if the usage is above 50% of
limit, so simply lowering the limit automatically triggers gradual
reclaim.
* Forcing back swapped out pages is likely to heavily impact the
workload and mess up the working set. Given that swap usually is a
lot less valuable and less scarce, letting the existing usage
dissipate over time through the above gradual reclaim and as they're
falted back in is likely the better behavior.
Link: http://lkml.kernel.org/r/20180523185041.GR1718769@devbig577.frc2.facebook.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Souptick Joarder [Fri, 8 Jun 2018 00:09:17 +0000 (17:09 -0700)]
mm/shmem.c: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.
See commit
1c8f422059ae ("mm: change return type to vm_fault_t")
vmf_error() is the newly introduce inline function in 4.17-rc6.
Link: http://lkml.kernel.org/r/20180521202410.GA17912@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:09:14 +0000 (17:09 -0700)]
slub: remove 'reserved' file from sysfs
Christoph doubts anyone was using the 'reserved' file in sysfs, so remove
it.
Link: http://lkml.kernel.org/r/20180518194519.3820-17-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:09:10 +0000 (17:09 -0700)]
slub: remove kmem_cache->reserved
The reserved field was only used for embedding an rcu_head in the data
structure. With the previous commit, we no longer need it. That lets us
remove the 'reserved' argument to a lot of functions.
Link: http://lkml.kernel.org/r/20180518194519.3820-16-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:09:05 +0000 (17:09 -0700)]
slab,slub: remove rcu_head size checks
rcu_head may now grow larger than list_head without affecting slab or
slub.
Link: http://lkml.kernel.org/r/20180518194519.3820-15-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:09:01 +0000 (17:09 -0700)]
mm: add hmm_data to struct page
Make hmm_data an explicit member of the struct page union.
Link: http://lkml.kernel.org/r/20180518194519.3820-14-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:57 +0000 (17:08 -0700)]
mm: add pt_mm to struct page
For pgd page table pages, x86 overloads the page->index field to store a
pointer to the mm_struct. Rename this to pt_mm so it's visible to other
users.
Link: http://lkml.kernel.org/r/20180518194519.3820-13-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:53 +0000 (17:08 -0700)]
mm: improve struct page documentation
Rewrite the documentation to describe what you can use in struct page
rather than what you can't.
Link: http://lkml.kernel.org/r/20180518194519.3820-12-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:50 +0000 (17:08 -0700)]
mm: combine LRU and main union in struct page
This gives us five words of space in a single union in struct page. The
compound_mapcount moves position (from offset 24 to offset 20) on 64-bit
systems, but that does not seem likely to cause any trouble.
Link: http://lkml.kernel.org/r/20180518194519.3820-11-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:46 +0000 (17:08 -0700)]
mm: move lru union within struct page
Since the LRU is two words, this does not affect the double-word alignment
of SLUB's freelist.
Link: http://lkml.kernel.org/r/20180518194519.3820-10-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:42 +0000 (17:08 -0700)]
mm: use page->deferred_list
Now that we can represent the location of 'deferred_list' in C instead of
comments, make use of that ability.
Link: http://lkml.kernel.org/r/20180518194519.3820-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:39 +0000 (17:08 -0700)]
mm: combine first three unions in struct page
By combining these three one-word unions into one three-word union, we
make it easier for users to add their own multi-word fields to struct
page, as well as making it obvious that SLUB needs to keep its double-word
alignment for its freelist & counters.
No field moves position; verified with pahole.
Link: http://lkml.kernel.org/r/20180518194519.3820-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:35 +0000 (17:08 -0700)]
mm: move _refcount out of struct page union
Keeping the refcount in the union only encourages people to put something
else in the union which will overlap with _refcount and eventually explode
messily. pahole reports no fields change location.
Link: http://lkml.kernel.org/r/20180518194519.3820-7-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:31 +0000 (17:08 -0700)]
mm: move 'private' union within struct page
By moving page->private to the fourth word of struct page, we can put the
SLUB counters in the same word as SLAB's s_mem and still do the
cmpxchg_double trick. Now the SLUB counters no longer overlap with the
mapcount or refcount so we can drop the call to page_mapcount_reset() and
simplify set_page_slub_counters() to a single line.
Link: http://lkml.kernel.org/r/20180518194519.3820-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:26 +0000 (17:08 -0700)]
mm: switch s_mem and slab_cache in struct page
This will allow us to store slub's counters in the same bits as slab's
s_mem. slub now needs to set page->mapping to NULL as it frees the page,
just like slab does.
Link: http://lkml.kernel.org/r/20180518194519.3820-5-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 8 Jun 2018 00:08:23 +0000 (17:08 -0700)]
mm: mark pages in use for page tables
Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables. This can be helpful when debugging crashdumps or
analysing memory fragmentation. Add a KPF flag to report these pages to
userspace and update page-types.c to interpret that flag.
Note that only pages currently accounted as NR_PAGETABLES are tracked as
PageTable; this does not include pgd/p4d/pud/pmd pages. Those will be the
subject of a later patch.
Link: http://lkml.kernel.org/r/20180518194519.3820-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>