of_irq_get() may return 0 as well as a nagative error number on failure
(and never on success), however omap3xxx_prm_late_init() regards 0 as a
valid IRQ -- fix this.
Fixes: 1e037794f7 ("ARM: OMAP3+: PRM: register interrupt information from DT")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Since commit a8636c8964 ("PM / Runtime: Don't allow to suspend a
device with an active child"), which went into 4.10, it is no longer
permitted to set RPM_SUSPENDED state for a device with active children
(unless power.ignore_children is set).
This specifically means that the attempts to do just that from the omap
pm-domain suspend_noirq callback have since been failing whenever a
child is active, for example:
am335x-usb-childs 47400000.usb: runtime PM trying to suspend
device but active child
Silence this warning by dropping the broken pm_runtime_set_suspended()
call from the omap suspend_noirq callback along with the redundant
pm_runtime_set_active() in resume_noirq.
This effectively reverts commit 3522bf7bfa ("ARM: OMAP2+: omap_device:
maintain sane runtime pm status around suspend/resume"), which started
updating the RPM state after the runtime_suspend callback (!) for active
omap devices had been called during system suspend. The rationale was
that a later pm_runtime_get_sync() would then fail (even after runtime
pm had been disabled) and that this in turn would avoid any external
aborts when accessing registers with clocks disabled. (See also commit
6f3c77b040 ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").
But during the suspend_noirq phase all children would already have been
suspended and their drivers would specifically not attempt any further
register accesses. And if this was all just a workaround for random
device drivers doing cross-tree calls during system suspend, those
drivers should be fixed and updated to explicitly model such
dependencies using device-links instead (and either way, any such calls
have been causing crashes since 4.10).
Fixes: 3522bf7bfa ("ARM: OMAP2+: omap_device: maintain sane runtime pm status around suspend/resume")
Fixes: a8636c8964 ("PM / Runtime: Don't allow to suspend a device with an active child")
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Dave Gerlach <d-gerlach@ti.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The error handling code in omap_ocp2scp_probe fails to invoke
pm_runtime_disable and fails to initialize return value in
certain cases. Fix it here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
v1 series[1] for dp83867 phy impedance-control support,
specifies to use ti,impedance-control with a value. These
properties got updated iduring review to specify whether
min or max impedance. But the DT still uses the old values
which never takes effect. Update the DT node by using the
proper DT properties.
[1] https://patchwork.kernel.org/patch/9239729/
Fixes: 9868bc585ae2c ("ARM: dts: Add support for dra718-evm")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 599c376c49 ("ARM: dts: Fix gpio interrupts for dm816x")
corrected some problems with the MMC. However, it gets the write
protect pin backwards. It needs to be ACTIVE_HIGH not ACTIVE_LOW.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mihail Grigorov <michael.grigorov@konsulko.com>
Signed-off-by: Tom Rini <trini@konsulko.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The ELM node in dm816x.dtsi needs to declare the correct compatible
value here as per the binding only one value is correct, and the current
driver handles it correctly. We then add pinmux information for the
NAND found on the EVM so that we do not rely on the ROM to do this for
us, and also so that we do not try and probe NAND before we probe the
ELM.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Mihail Grigorov <michael.grigorov@konsulko.com>
Signed-off-by: Tom Rini <trini@konsulko.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 2a26d31b1b ("ARM: OMAP2+: Remove unused legacy code for PRM")
removed PRM platform init code that I thought is unused. Turns out omap4
still needs this code, so let's do a partial revert to add it back.
I probably missed this earlier as the comments used to say
"OMAP4+ is DT only now" for !of_have_populated_dt() to exit early and
missed the negative test. Let's not add those lines back as they are
confusing and no longer needed as we only boot in device tree mode.
Without things things can mysterious fail for i2c, for example LM75
I2C temperature sensor can stop working as the PRM interrupts won't work.
Fixes: 2a26d31b1b ("ARM: OMAP2+: Remove unused legacy code for PRM")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Shared interrupts with IRQ_NOAUTOEN got a warning added with commit
04c848d398 ("genirq: Warn when IRQ_NOAUTOEN is used with shared
interrupts").
Let's just drop the IRQ_NOAUTOEN use for omap3 PRM shared interrupt as
it does not seem to cause any other issues based on my testing. We have
moved a lot of the code to initialize later, and whatever problems the
legacy booting had seem to be gone now with pinctrl driver and device
tree based booting.
Otherwise we will get:
WARNING: CPU: 0 PID: 1 at kernel/irq/manage.c:1348 __setup_irq+0x5d0/0x64c
[<c01b0260>] (__setup_irq) from [<c01b0480>]
(request_threaded_irq+0xdc/0x188)
[<c01b0480>] (request_threaded_irq) from [<c051c780>]
(pcs_probe+0x6ec/0x8a4)
[<c051c780>] (pcs_probe) from [<c05a84b8>] (platform_drv_probe+0x50/0xb0)
[<c05a84b8>] (platform_drv_probe) from [<c05a6288>]
(driver_probe_device+0x33c/0x478)
Note that we also need to remove the related enable_irq() to avoid
getting the following:
WARNING: CPU: 0 PID: 1 at kernel/irq/manage.c:529 enable_irq+0x34/0x70
[<c01afa04>] (enable_irq) from [<c0c0f1fc>] (omap3_pm_init+0x118/0x3f8)
[<c0c0f1fc>] (omap3_pm_init) from [<c0c0ae7c>] (am35xx_init_late+0x10/0x18)
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
OMAP4 SoC contains SHAM crypto hardware accelerator. Add hwmod data for
this IP so that it can be utilized by crypto frameworks.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This fixes the following error during kernel boot:
platform 480a5000.des: Cannot lookup hwmod 'des'
Unfortunately the DES module is only documented partly
in the OMAP4430 TRM. I found an old patch from Joel,
which I took over and updated for currently mainline.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This adds the hwmod entry for the second AES module
available on OMAP4.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This fixes the following error during kernel boot:
platform 4b501000.aes: Cannot lookup hwmod 'aes1'
Unfortunately the AES module is only documented partly
in the OMAP4430 TRM. I found an old patch from Joel,
which I took over and updated for currently mainline.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
[tony@atomide.com: left out probe changes to avoid merge conflict]
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Note that omap_init_sti() won't do anything so we can
remove omap2_init_devices() as pointed out by Sebastian
Reichel <sebastian.reichel@collabora.co.uk>.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Note that the volt_data is still being used.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
If clkctrl clocks are available on a device, populate these automatically
to replace hwmod main_clk info. First, the patch parses all "ti,clkctrl"
compatible nodes and maps these against existing clockdomain data. Once
done, individual hwmod init routines can search for a clkctrl clock
handle based on the clockdomain info and the created mapping.
This patch also drops the obsolete "_mod_ck" search as the implementation
required for this was not accepted usptream.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This function gets the physical base address of a clockdomain.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This new function can be used to get the physical address of a
clockdomain. Required for mapping the clkctrl clocks under hwmod
without modification to DT data.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
In some cases the physical address info is needed, so store this
under the existing cm*_base, prm_base and prcm_mpu_base variables.
These are converted now to structs that contain both virtual and
physical address base for the instance.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
These extra optional clocks are required as main clock for these modules
are going to be routed to the main module clock. Otherwise, the hdmi / tv
clocks are not going to be enabled during usage, leading to failure.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The mux clock handle shall be provided via "fck" DT handle. This avoids
the need to lookup the main clock via hwmod core, which will not work
with the clkctrl clock support anymore; the main clock is not going to
be a mux.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We are now booting all mach-omap2 in device tree only mode.
Any code that is only called in legacy boot mode where
of_have_populated_dt() is not set is safe to remove now.
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
omap_pm_clkdms_setup gets called from pm init functions and now that
pm33xx and pm43xx can be loaded as modules this must be kept to be
called at any point during runtime.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
OMAP timer code registers two timers - one as clocksource
and one as clockevent. Since AM33XX has only one usable timer
in the WKUP domain one of the timers needs suspend-resume
support to restore the configuration to pre-suspend state.
commit adc78e6b99 ("timekeeping: Add suspend and resume
of clock event devices") introduced .suspend and .resume
callbacks for clock event devices. Leverage these
callbacks to have AM33XX clockevent timer behave properly
across system suspend. Extend the use of the .suspend and
.resume callbacks used by am335x clockevent to am437x as well.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
AM43XX has the same wakeupgen IP as OMAP4/5. The only
notable difference is the presence of 7 register banks
and lack of SAR area which has been used in OMAP4/5 for
saving and restoring the context around low power states.
In case of AM43XX the context is saved and restored by
the kernel. Introduce wakeupgen_ops so that context save
and restore can be set on a per-SoC basis during init.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull some more input subsystem updates from Dmitry Torokhov:
"An updated xpad driver with a few more recognized device IDs, and a
new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
via SPI bus"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cros_ec_keyb - remove extraneous 'const'
Input: add support for PlayStation 1/2 joypads connected via SPI
Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
Input: xpad - sync supported devices with xboxdrv
Input: xpad - sort supported devices by USB ID
Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"
* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add missing NR_CPUS include
um: Fix to call read_initrd after init_bootmem
um: Include kbuild.h instead of duplicating its macros
um: Fix PTRACE_POKEUSER on x86_64
um: Set number of CPUs
um: Fix _print_addr()
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, docs: update memory.stat description with workingset* entries
mm: vmscan: scan until it finds eligible pages
mm, thp: copying user pages must schedule on collapse
dax: fix PMD data corruption when fault races with write
dax: fix data corruption when fault races with write
ext4: return to starting transaction in ext4_dax_huge_fault()
mm: fix data corruption due to stale mmap reads
dax: prevent invalidation of mapped DAX entries
Tigran has moved
mm, vmalloc: fix vmalloc users tracking properly
mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
gcov: support GCC 7.1
mm, vmstat: Remove spurious WARN() during zoneinfo print
time: delete current_fs_time()
hwpoison, memcg: forcibly uncharge LRU pages
Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
workingset transition") introduced three new entries in memory stat
file:
- workingset_refault
- workingset_activate
- workingset_nodereclaim
This commit adds a corresponding description to the cgroup v2 docs.
Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have encountered need_resched warnings in __collapse_huge_page_copy()
while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages.
mm->mmap_sem is held for write, but the iteration is well bounded.
Reschedule as needed.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.
Currently DAX PMD read fault can race with write(2) in the following
way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pmd_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add huge zero page to the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently DAX read fault can race with write(2) in the following way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pte_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add zero page in the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().
Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX. That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:
- open an mmap over a 2MiB hole
- read from a 2MiB hole, faulting in a 2MiB zero page
- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.
- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.
Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.
Fixes: c6dcf52c23
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.
This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).
The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.
This patch (of 4):
dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked. This is done via:
invalidate_mapping_pages()
invalidate_exceptional_entry()
dax_invalidate_mapping_entry()
However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped. This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().
For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.
We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().
Fixes: c6dcf52c23 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>