Commit Graph

603220 Commits

Author SHA1 Message Date
Dan Williams
0606263f24 Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next 2016-07-24 08:05:44 -07:00
Markus Elfring
d4c5725d57 libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
The __nd_device_register() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-24 08:04:55 -07:00
Vishal Verma
6839a6d96f nfit: do an ARS scrub on hitting a latent media error
When a latent (unknown to 'badblocks') error is encountered, it will
trigger a machine check exception. On a system with machine check
recovery, this will only SIGBUS the process(es) which had the bad page
mapped (as opposed to a kernel panic on platforms without machine
check recovery features). In the former case, we want to trigger a full
rescan of that nvdimm bus. This will allow any additional, new errors
to be captured in the block devices' badblocks lists, and offending
operations on them can be trapped early, avoiding machine checks.

This is done by registering a callback function with the
x86_mce_decoder_chain and calling the new ars_rescan functionality with
the address in the mce notificatiion.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-24 08:04:04 -07:00
Dan Williams
bdf97013ce nfit: move to nfit/ sub-directory
With the arrival of x86-machine-check support the nfit driver will add a
(conditionally-compiled) source file.  Prepare for this by moving all
nfit source to drivers/acpi/nfit/.  This is pure code movement, no
functional changes.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-24 08:04:04 -07:00
Vishal Verma
37b137ff8c nfit, libnvdimm: allow an ARS scrub to be triggered on demand
Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 21:51:42 -07:00
Dan Williams
18515942d6 libnvdimm: register nvdimm_bus devices with an nd_bus driver
A recent effort to add a new nvdimm bus provider attribute highlighted a
race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down.
The typical way to handle these races is to take the device_lock() in
the attribute method and validate that the device is still active.  In
order for a device to be 'active' it needs to be associated with a
driver.  So, we create the small boilerplate for a driver and register
nvdimm_bus devices on the 'nvdimm_bus_type' bus.

A result of this change is that ndbusX devices now appear under
/sys/bus/nd/devices.  In fact this makes /sys/class/nd somewhat
redundant, but removing that will need to take a long deprecation period
given its use by ndctl binaries in the field.

This change naturally pulls code from drivers/nvdimm/core.c to
drivers/nvdimm/bus.c, so it is a nice code organization clean-up as
well.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:06:33 -07:00
Vishal Verma
5bf0b6e1af pmem: clarify a debug print in pmem_clear_poison
Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:06:33 -07:00
Dan Williams
fd1d961dd6 x86/insn: remove pcommit
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:04:23 -07:00
Dan Williams
dfa169bbee Revert "KVM: x86: add pcommit support"
This reverts commit 8b3e34e46a.

Given the deprecation of the pcommit instruction, the relevant VMX
features and CPUID bits are not going to be rolled into the SDM.  Remove
their usage from KVM.

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 11:04:23 -07:00
Dan Williams
58cd71b474 nfit, tools/testing/nvdimm/: unify shutdown paths
While testing the new on-demand ARS patches we discovered that
differences between the nfit_test and normal nfit driver shutdown paths
can leak resources.  Unify the shutdown paths to trigger via a devm_
callback when the acpi_desc->dev is unbound from its driver.

Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-22 13:35:54 -07:00
Dan Williams
bc9775d869 libnvdimm: move ->module to struct nvdimm_bus_descriptor
Let the provider module be explicitly passed in rather than implicitly
assumed by the module that calls nvdimm_bus_register().  This is in
preparation for unifying the nfit and nfit_test driver teardown paths.

Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 20:03:19 -07:00
Dan Williams
e7a11b449e nfit: cleanup acpi_nfit_init calling convention
Pass the nfit buffer as a parameter rather than hanging it off of
acpi_desc.

Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 14:12:18 -07:00
Dan Williams
3193204149 nfit: fix _FIT evaluation memory leak + use after free
acpi_evaluate_object() allocates memory. Free the buffer allocated
during acpi_nfit_add(). In order for this memory to be freed
acpi_nfit_init() needs to be converted to duplicate the nfit contents in
its internal allocation.  Use zero-length arrays to minimize the thrash
with the rest of the nfit driver implementation.

All of the add_<nfit-sub-table>() routines now validate a minimum table
size and expect hotplugged tables to match the size of the original
table to count as a duplicate. For variable length tables, like 'idt'
and 'flush', we calculate the dynamic size. Note that hotplug by
definition cannot change the interleave as it would cause data
corruption of in-use namespaces.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Xiao Guangrong <guangrong.xiao@intel.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 14:12:18 -07:00
Dan Williams
5dc68e5574 tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
New for ACPI 6.1, these fields are used in the common dimm
representation format defined by section 5.2.25.9 "NVDIMM representation
format".

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 14:12:18 -07:00
Dan Williams
7bfe97c763 tools/testing/nvdimm: add virtual ramdisk range
Test the virtual disk ranges that platform firmware like EDK2/OVMF might
emit.

Tested-by: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 14:12:18 -07:00
Lee, Chun-Yi
c2f32acdf8 acpi, nfit: treat virtual ramdisk SPA as pmem region
This patch adds logic to treat virtual ramdisk SPA as pmem region, then
ramdisk's /dev/pmem* device can be mounted with iso9660.

It's useful to work with the httpboot in EFI firmware to pull a remote
ISO file to the local memory region for booting and installation.

Wiki page of UEFI HTTPBoot with OVMF:
	https://en.opensuse.org/UEFI_HTTPBoot_with_OVMF

The ramdisk function in EDK2/OVMF generates a ACPI0012 root device that
it contains empty _STA but without _DSM:

DefinitionBlock ("ssdt2.aml", "SSDT", 2, "INTEL ", "RamDisk ", 0x00001000)
{
    Scope (\_SB)
    {
        Device (NVDR)
        {
            Name (_HID, "ACPI0012")  // _HID: Hardware ID
            Name (_STR, Unicode ("NVDIMM Root Device"))  // _STR: Description String
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (0x0F)
            }
        }
    }
}

In section 5.2.25.2 of ACPI 6.1 spec, it mentions that the "SPA Range
Structure Index" of virtual SPA shall be set to zero. That means virtual SPA
will not be associated by any NVDIMM region mapping.

The VCD's SPA Range Structure in NFIT is similar to virtual disk region
as following:

[028h 0040   2]                Subtable Type : 0000 [System Physical Address Range]
[02Ah 0042   2]                       Length : 0038

[02Ch 0044   2]                  Range Index : 0000
[02Eh 0046   2]        Flags (decoded below) : 0000
                   Add/Online Operation Only : 0
                      Proximity Domain Valid : 0
[030h 0048   4]                     Reserved : 00000000
[034h 0052   4]             Proximity Domain : 00000000
[038h 0056  16]           Address Range GUID : 77AB535A-45FC-624B-5560-F7B281D1F96E
[048h 0072   8]           Address Range Base : 00000000B6ABD018
[050h 0080   8]         Address Range Length : 0000000005500000
[058h 0088   8]         Memory Map Attribute : 0000000000000000

The way to not associate a SPA range is to never reference it from a "flush hint",
"interleave", or "control region" table.

After testing on OVMF, pmem driver can support the region that it doesn't
assoicate to any NVDIMM mapping. So, treat VCD like pmem is a idea to get
a pmem block device that it contains iso.

v4:
Instoduce nfit_spa_is_virtual() to check virtual ramdisk SPA and create
pmem region.

v3:
To simplify patch, removed useless VCD region in libnvdimm.

v2:
Removed the code for setting VCD to a read-only region.

Cc: Gary Lin <GLin@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 14:12:18 -07:00
Dan Williams
a72255983f nfit: make DIMM DSMs optional
Commit 4995734e97 "acpi, nfit: fix acpi_check_dsm() vs zero functions
implemented" attempted to fix a QEMU regression by supporting its usage
of a zero-mask as a valid response to a DSM-family probe request.
However, this behavior breaks HP platforms that return a zero-mask by
default causing the probe to misidentify the DSM-family.

Instead, the QEMU regression can be fixed by simply not requiring the DSM
family to be identified.

This effectively reverts commit 4995734e97, and removes the DSM
requirement from the init path.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Fixes: 4995734e97 ("acpi, nfit: fix acpi_check_dsm() vs zero functions implemented")
Reported-by: Jerry Hoemann <jerry.hoemann@hpe.com>
Tested-by: Jerry Hoemann <jerry.hoemann@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-19 12:32:39 -07:00
Dan Williams
7a9eb20666 pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 19:25:38 -07:00
Dan Williams
7c8a6a7190 pmem: kill wmb_pmem()
All users have been replaced with flushing in the pmem driver.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Dan Williams
91131dbd1d libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
nsio_rw_bytes() is used to write info block metadata to the namespace,
so it should trigger a flush after every write.  Replace wmb_pmem() with
nvdimm_flush() in this path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Dan Williams
14df6a4e7e fs/dax: remove wmb_pmem()
Flushing posted-write queues is now deferred to REQ_FLUSH context, or
otherwise handled by an ADR event at the platform level.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Dan Williams
476f848aae libnvdimm, pmem: flush posted-write queues on shutdown
Commit writes to media on system shutdown or pmem driver unload.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 15:13:48 -07:00
Dan Williams
7e267a8c79 libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush()
Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Note that we still arrange for nvdimm_flush() to be called even in the
ADR case. We need at least once wmb() fence to push buffered writes in
the cpu out to the ADR protected domain.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 16:16:03 -07:00
Dan Williams
0c27af60d1 libnvdimm: cycle flush hints
When the NFIT provides multiple flush hint addresses per-dimm it is
expressing that the platform is capable of processing multiple flush
requests in parallel.  There is some fixed cost per flush request, let
the cost be shared in parallel on multiple cpus.

Since there may not be enough flush hint addresses for each cpu to have
one, keep a per-cpu index of the last used hint, hash it with current
pid, and assume that access pattern and scheduler randomness will keep
the flush-hint usage somewhat staggered across cpus.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 16:16:03 -07:00
Dan Williams
f284a4f237 libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()
nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

      1: flush addresses defined
      0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

      1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
      0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 16:13:42 -07:00
Dan Williams
a8f720224e libnvdimm: keep region data alive over namespace removal
nd_region device driver data will be used in the namespace i/o path.
Re-order nd_region_remove() to ensure this data stays live across
namespace device removal

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 16:13:41 -07:00
Dan Williams
85d3fa02e4 tools/testing/nvdimm: simulate multiple flush hints per-dimm
Sample nfit data to test the kernel's handling of the multiple
flush-hint case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 16:13:41 -07:00
Dan Williams
e5ae3b252c libnvdimm, nfit: move flush hint mapping to region-device driver-data
In preparation for triggering flushes of a DIMM's writes-posted-queue
(WPQ) via the pmem driver move mapping of flush hint addresses to the
region driver.  Since this uses devm_nvdimm_memremap() the flush
addresses will remain mapped while any region to which the dimm belongs
is active.

We need to communicate more information to the nvdimm core to facilitate
this mapping, namely each dimm object now carries an array of flush hint
address resources.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 15:09:26 -07:00
Dan Williams
a8a6d2e04c libnvdimm, nfit: remove nfit_spa_map() infrastructure
Now that all shared mappings are handled by devm_nvdimm_memremap() we no
longer need nfit_spa_map() nor do we need to trigger a callback to the
bus provider at region disable time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 15:09:26 -07:00
Dan Williams
29b9aa0aa3 libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users
In preparation for generically mapping flush hint addresses for both the
BLK and PMEM use case, provide a generic / reference counted mapping
api.  Given the fact that a dimm may belong to multiple regions (PMEM
and BLK), the flush hint addresses need to be held valid as long as any
region associated with the dimm is active.  This is similar to the
existing BLK-region case where multiple BLK-regions may share an
aperture mapping.  Up-level this shared / reference-counted mapping
capability from the nfit driver to a core nvdimm capability.

This eliminates the need for the nd_blk_region.disable() callback.  Note
that the removal of nfit_spa_map() and related infrastructure is
deferred to a later patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-07 17:11:09 -07:00
Dan Williams
81ed4e3670 nfit: don't override return value of nfit_mem_init
We were needlessly converting nfit_mem_init() errors to -ENOMEM.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-07 17:11:09 -07:00
Dan Williams
ad9ac5e195 nfit: always associate flush hints
Before enabling use of flush hints for pmem regions, we need to make
sure they are always associated.  Move the initialization of nfit_flush
out of the block-window specific init path to the general init path.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-07 17:11:09 -07:00
Dan Williams
b28f08ce50 tools/testing/nvdimm: remove __wrap_devm_memremap_pages placeholder
This now dead code was needed to prevent compile errors while being
staged in -next for v4.5.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-07 17:11:02 -07:00
Sajjan, Vikas C
d1c8e0c521 dax: use devm_add_action_or_reset()
If devm_add_action() fails, we are explicitly calling the cleanup to free
the resources allocated. Use the helper devm_add_action_or_reset()
and return directly in case of error, since the cleanup function
has been already called by the helper if there was any error.

Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-06 15:14:48 -07:00
Sajjan, Vikas C
d932dd2ccd nfit: use devm_add_action_or_reset()
If devm_add_action() fails, we are explicitly calling the cleanup to free
the resources allocated. Lets use the helper devm_add_action_or_reset()
and return directly in case of error, since the cleanup function
has been already called by the helper if there was any error.

Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-06 15:12:41 -07:00
Johannes Thumshirn
8729bdea82 libnvdimm: initialize struct blk_integrity with 0
Initialize struct blk_integrity with 0 as blk_integrity_register() takes the
then unitialized struct blk_integrity::flags and ORs it to the resulting block
integrity structure.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-06 10:34:42 -07:00
Linus Torvalds
a99cde438d Linux 4.7-rc6 2016-07-03 23:01:00 -07:00
Linus Torvalds
0b295dd5b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "This makes sure userspace filesystems are not broken by the parallel
  lookups and readdir feature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: serialize dirops by default
2016-07-03 12:02:00 -07:00
Linus Torvalds
236bfd8ed8 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "This contains fixes for a dentry leak, a regression in 4.6 noticed by
  Docker users and missing write access checking in truncate"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: warn instead of error if d_type is not supported
  ovl: get_write_access() in truncate
  ovl: fix dentry leak for default_permissions
2016-07-03 11:57:09 -07:00
Vivek Goyal
e7c0b5991d ovl: warn instead of error if d_type is not supported
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.

But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.

https://github.com/docker/docker/issues/22937#issuecomment-229881315

So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
2016-07-03 09:39:31 +02:00
Linus Torvalds
4f302921c1 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fix from Ralf Baechle:
 "Only a single fix for 4.7 pending at this point.  It fixes an issue
  that may lead to corruption of the cache mode bits in the page table"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix possible corruption of cache mode by mprotect.
2016-07-02 19:10:21 -07:00
Linus Torvalds
70bd68d7f7 powerpc fixes for 4.7 #5
- tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur
  - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling
  - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
  - Initialise pci_io_base as early as possible from Darren Stevens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd
 kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h
 cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22
 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd
 INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5
 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg
 /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P
 sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7
 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE
 bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m
 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab
 Lh1o+G3gDo9MVaphoJ3w
 =oEAY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - tm: Always reclaim in start_thread() for exec() class syscalls from
   Cyril Bur

 - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
   Neuling

 - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan

 - Initialise pci_io_base as early as possible from Darren Stevens

* tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Initialise pci_io_base as early as possible
  powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
  powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
  powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
2016-07-02 17:47:54 -07:00
Linus Torvalds
99b0f54e6a Merge tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes frlm Dave Airlie:
 "Just some AMD and Intel fixes, the AMD ones are further production
  Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID
  changes and a couple of other fixes.

  Still a bit Internet challenged here, hopefully end of next week will
  solve it"

* tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Fix missing unlock on error in i915_ppgtt_info()
  drm/amd/powerplay: workaround for UVD clock issue
  drm/amdgpu: add ACLK_CNTL setting for polaris10
  drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
  drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
  drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
  drm/i915: Add more Kabylake PCI IDs.
  drm/i915: Avoid early timeout during AUX transfers
  drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
  drm/i915/lpt: Avoid early timeout during FDI PHY reset
  drm/i915/bxt: Avoid early timeout during PLL enable
  drm/i915: Refresh cached DP port register value on resume
  drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
  drm/amd/powerplay: disable FFC.
  drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02 09:41:28 -07:00
Linus Torvalds
467ce7693f spi: Fixes for v4.7
A few small driver-specific fixes for SPI, all in the normal important
 if you hit them category especially the rockchip driver fix which
 addresses a race which has been exposed more frequently with some recent
 performance improvements.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXd4PmAAoJECTWi3JdVIfQ0A4H/1D1m/bsdAY0FS12YQteV1GQ
 JYUvj/CWNiuV1MSqIU0WWrlMFO9EwtsojytLIfmGNPXUvPMSpW/Q3IJqhRrQWhmN
 Zl1FUUUIz+sGBsqwq6nZbDJxJhpT6/Tb7YDFR6Oi4l7VeB4Sisv5ax6Ay+uwa/mp
 cvrb/ULILLCHAv0v+6rSjIkZlvD1Yc+08SbbUrTPtVdcY0TDbJEkZ+U2IviCsxmP
 PuqdPUgFIEy7j+hnbEib24f5BNZ/m1a0DY012+es7fSkD5DVa2h71kxS54RC6QYn
 BkkMdK7moflBpbipKlI/eBPz73eePO9SxBwUZFLSUzG4DJEJE2mKKRVyU2LHpGA=
 =uU1M
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few small driver-specific fixes for SPI, all in the normal important
  if you hit them category especially the rockchip driver fix which
  addresses a race which has been exposed more frequently with some
  recent performance improvements"

* tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: sunxi: fix transfer timeout
  spi: sun4i: fix FIFO limit
  spi: rockchip: Signal unfinished DMA transfers
  spi: spi-ti-qspi: Suspend the queue before removing the device
2016-07-02 09:40:11 -07:00
Linus Torvalds
a2b0db5b55 regulator: Fixes for v4.7
Two small fixes for the regulator subsystem - one fixing a crash with
 one of the devices supported by the max77620 driver, another fixing
 startup for the anatop regulator when it starts up with the regulator in
 bypass mode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXd4GeAAoJECTWi3JdVIfQ+g8H/jpOOKEri6SVaatEbk2J33oA
 YD6EpSV4BRVGQRRLSY11tz3Md45xPLhe47kRpT1antgplKVEAJYARGYGVLmCovGz
 nFXqbQzeTvxs6o6UdrkJlFLDmn4yZgrL4MqhjfxLUzX+Yz/3neQZq6KhESCWKphT
 WPBrNa90s0j+nBCGJV0LxcuoZiKt6th/GUjr0gngepckrg3gIxJJyRVvtE9iVyip
 JYdVTfHt/aYJihIdKAPnXa4M9Ky5KY8ZNTySYcyUaXT3VLDK/UNFFO25q4Nw6F5a
 6XgcjwcFYDpZitf7pQYYfoobDbgTJ1XllPujEP82rIafruAvAKtfORHnvVJfRlU=
 =ZA94
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "Two small fixes for the regulator subsystem - one fixing a crash with
  one of the devices supported by the max77620 driver, another fixing
  startup for the anatop regulator when it starts up with the regulator
  in bypass mode"

* tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: max77620: check for valid regulator info
  regulator: anatop: allow regulator to be in bypass mode
2016-07-02 09:39:03 -07:00
Linus Torvalds
4438512005 A small fix for the newly added oxnas clk driver and a handful of
rockchip clk driver fixes for newly added rk3399 support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXdwP4AAoJEK0CiJfG5JUlG8oP/R+kfW2OBvwrJ4ubGnl68obU
 PoLqATqi9S3Zq7nEd4B04+SQT4OBgj63YnCAEwJY9D7KSCoJNYL9nML9FzB/MCiC
 R2Yune3PIsaEuZFZObnURGO5eRxyZqSJEGgzck1LXwrX/BCyrVHGYsJ0EWm2exZo
 zp8YPpM73oILwwCJGivVATNWXFT9HtQ1nfxAmH/e288t6nNuCnXPJf7PLm1t6ERF
 fjQiPRILodQqB3OtKEYvEv2z682RC0Zivwv6O/dd0FEG33aXB2VgAlCz8Ttp/brf
 fFV5L+i7XjDGDyBcZR2dYCOGwx42XYFOKvUpK+ePnbD0x23V3MSjwFU4yeItZlnD
 3PKO0lIJgbRjMtL4a03iWLzcaxDEbMb+/RVPMUtUWoGyId9l2S+cUblB7NuzTfLL
 RnTu6ttUSh3IY81eiAkrDRjafdAkswOf3lJweg+Rx2OZhzaKX0UcpDBaeUdzdqx6
 lxZ8chv8zYnfjgYUEu2jYvIIM/0VoHZgDmvarNRMreATPaOYqDMTfwsGZeTYcvNU
 qZhpNmue3c1cKGAYwzO3elB3KwhuB+83DVXGJu+6/RabbSk3OTMhaGwIH1NSQeCj
 nStb6C9h+Mog9ZTFukUlx7js6d0O8GmjlT4crlWxJ4QGdDvS2j7/hT53e8uK7SLl
 N5WzJrO9szvkZ8ZC9/LI
 =fAAu
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A small fix for the newly added oxnas clk driver and a handful of
  rockchip clk driver fixes for newly added rk3399 support"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: Fix return value check in oxnas_stdclk_probe()
  clk: rockchip: release io resource when failing to init clk on rk3399
  clk: rockchip: fix cpuclk registration error handling
  clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
  clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
  clk: rockchip: mark rk3399 GIC clocks as critical
  clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
2016-07-02 09:36:49 -07:00
Dave Airlie
88c087109b Merge tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes
here's a batch of i915 fixes for 4.7.

* tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Fix missing unlock on error in i915_ppgtt_info()
  drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
  drm/i915: Add more Kabylake PCI IDs.
  drm/i915: Avoid early timeout during AUX transfers
  drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
  drm/i915/lpt: Avoid early timeout during FDI PHY reset
  drm/i915/bxt: Avoid early timeout during PLL enable
  drm/i915: Refresh cached DP port register value on resume
2016-07-02 15:50:41 +10:00
Dave Airlie
40793e85d2 Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few more late fixes for Polaris cards.

* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: workaround for UVD clock issue
  drm/amdgpu: add ACLK_CNTL setting for polaris10
  drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
  drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
  drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
  drm/amd/powerplay: disable FFC.
  drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02 15:48:33 +10:00
Ralf Baechle
6d037de90a MIPS: Fix possible corruption of cache mode by mprotect.
The following testcase may result in a page table entries with a invalid
CCA field being generated:

static void *bindstack;

static int sysrqfd;

static void protect_low(int protect)
{
	mprotect(bindstack, BINDSTACK_SIZE, protect);
}

static void sigbus_handler(int signal, siginfo_t * info, void *context)
{
	void *addr = info->si_addr;

	write(sysrqfd, "x", 1);

	printf("sigbus, fault address %p (should not happen, but might)\n",
	       addr);
	abort();
}

static void run_bind_test(void)
{
	unsigned int *p = bindstack;

	p[0] = 0xf001f001;

	write(sysrqfd, "x", 1);

	/* Set trap on access to p[0] */
	protect_low(PROT_NONE);

	write(sysrqfd, "x", 1);

	/* Clear trap on access to p[0] */
	protect_low(PROT_READ | PROT_WRITE | PROT_EXEC);

	write(sysrqfd, "x", 1);

	/* Check the contents of p[0] */
	if (p[0] != 0xf001f001) {
		write(sysrqfd, "x", 1);

		/* Reached, but shouldn't be */
		printf("badness, shouldn't happen but does\n");
		abort();
	}
}

int main(void)
{
	struct sigaction sa;

	sysrqfd = open("/proc/sysrq-trigger", O_WRONLY);

	if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) {
		perror("sigprocmask");
		return 0;
	}

	sa.sa_sigaction = sigbus_handler;
	sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART;
	if (sigaction(SIGBUS, &sa, NULL)) {
		perror("sigaction");
		return 0;
	}

	bindstack = mmap(NULL,
			 BINDSTACK_SIZE,
			 PROT_READ | PROT_WRITE | PROT_EXEC,
			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (bindstack == MAP_FAILED) {
		perror("mmap bindstack");
		return 0;
	}

	printf("bindstack: %p\n", bindstack);

	run_bind_test();

	printf("done\n");

	return 0;
}

There are multiple ingredients for this:

 1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3
    on all platforms except SB1 where it's CCA 5.
 2) _page_cachable_default must have bits set which are not set
    _CACHE_CACHABLE_NONCOHERENT.
 3) Either the defective version of pte_modify for XPA or the standard
    version must be in used.  However pte_modify for the 36 bit address
    space support is no affected.

In that case additional bits in the final CCA mode may generate an invalid
value for the CCA field.  On the R10000 system where this was tracked
down for example a CCA 7 has been observed, which is Uncached Accelerated.

Fixed by:

 1) Using the proper CCA mode for PAGE_NONE just like for all the other
    PAGE_* pte/pmd bits.
 2) Fix the two affected variants of pte_modify.

Further code inspection also shows the same issue to exist in pmd_modify
which would affect huge page systems.

Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE
and pmd_modify issue found by me.

The history of this goes back beyond Linus' git history.  Chris Dearman's
commit 351336929c ("[MIPS] Allow setting of
the cache attribute at run time.") missed the opportunity to fix this
but it was originally introduced in lmo commit
d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.")
and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option
CONFIG_MIPS_UNCACHED.")

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
2016-07-02 01:51:39 +02:00
Linus Torvalds
dbdc3bb74f ACPI fix for v4.7-rc6
Fix an expression in the ACPI PCI IRQ management code added by a
 recent commit that overlooked missing parens in it, so the result
 of the computation is incorrect in some cases (Sinan Kaya).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXdtScAAoJEILEb/54YlRx5r4P/AwWjfh3wPsDCXCajSkiqR7b
 raPnXYXvVhEC4ZfqLpuID1sk/DzQL/MZmkFpB8PUYkf1onOKXcd9MDsyiVdOuJi/
 gqP64yzGI+n79qq4u+vhUlujsDH11X8WtKxNQoZWWdHoLR5o/qhYA4c3kZ0yjzim
 uD/3GWtXsmbPMyBMJjOIFJLFpj3DgizxER3PUL0WdX1XB8YXX/e16WT4+8IkHVru
 mM8Mk389+iQoKfDgWAL9ImOcfY9xXh0IPkKF/kHT2D0CG4qHfjGWUfu124hu83Fp
 YGIhcmX0De5cwb3PchQ4Uga2U4KbzKgAhW574RFD+pCmY5EX/fjYDaPPCU4Gctzf
 RNTG3JIS+mpR2maC+t73lJUanQSxS4GahXtEGfJ7ci3q85sIXuX5GSqd9CkXScCZ
 /0Ww+xjiXyeo1EE/PFC0rXZT7h5bzCdpqWbUUhcSDnFKj8TLz/LjiNd6vCFZAVUF
 G/7p9rY2n+Nid8j7DVITKcYS6HuaucKOTGKbPp7P3HBXc5N3d5vqrSrVcwQ/Yi+g
 hwaoDle506BUvTgIwrjCEPBRFRmFz8uRHOMMbpugzYZm2Kn5UUF3Za358NqI6pYR
 LiosgEiCXYFQScDeFOV5QD9hRGRoWaL6C54po1d459MGjqgwDt4lolbOhy7BIWXj
 pJMmdHbczB0VT6KRyYxs
 =r8BR
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix an expression in the ACPI PCI IRQ management code added by a
  recent commit that overlooked missing parens in it, so the result of
  the computation is incorrect in some cases (Sinan Kaya)"

* tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI,PCI,IRQ: correct operator precedence
2016-07-01 15:31:48 -07:00