Commit Graph

147 Commits

Author SHA1 Message Date
Sai Praneeth
6d0cc887d5 x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
Now that we have EFI memory region bits that indicate which regions do
not need execute permission or read/write permission in the page tables,
let's use them.

We also check for EFI_NX_PE_DATA and only enforce the restrictive
mappings if it's present (to allow us to ignore buggy firmware that sets
bits it didn't mean to and to preserve backwards compatibility).

Instead of assuming that firmware would set appropriate attributes in
memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for
data, we can expect some firmware out there which might only set *type*
in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or
EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to
improper mappings of EFI runtime regions. In order to avoid it, we check
attribute and type of memory descriptor to update mappings and moreover
Windows works this way.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:26:28 +01:00
Robert Elliott
1e82b94790 x86/efi: Show actual ending addresses in efi_print_memmap
Adjust efi_print_memmap to print the real end address of each
range, not 1 byte beyond. This matches other prints like those
for SRAT and nosave memory.

While investigating grub persistent memory corruption issues, it
was helpful to make this table match the ending address
convention used by:
* the kernel's e820 table prints
	BIOS-e820: [mem 0x0000001680000000-0x0000001c7fffffff] reserved
* the kernel's nosave memory prints
	PM: Registered nosave memory: [mem 0x880000000-0xc7fffffff]
* the kernel's ACPI System Resource Affinity Table prints
	SRAT: Node 1 PXM 1 [mem 0x480000000-0x87fffffff]
* grub's lsmmap and lsefimmap commands
	reserved  0000001680000000-0000001c7fffffff 00600000     24GiB UC WC WT WB NV
* the UEFI shell's memmap command
	Reserved   000000007FC00000-000000007FFFFFFF 0000000000000400 0000000000000001

For example, if you grep all the various logs for c7fffffff, you
won't find the kernel's line if it uses c80000000.

Also, change the closing ) to ] to match the opening [.

old:
    efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c80000000) (16384MB)

new:
    efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c7fffffff] (16384MB)

Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1454364428-494-12-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03 11:41:20 +01:00
Matt Fleming
753b11ef8e x86/efi: Setup separate EFI page tables in kexec paths
The switch to using a new dedicated page table for EFI runtime
calls in commit commit 67a9108ed4 ("x86/efi: Build our own
page table structures") failed to take into account changes
required for the kexec code paths, which are unfortunately
duplicated in the EFI code.

Call the allocation and setup functions in
kexec_enter_virtual_mode() just like we do for
__efi_enter_virtual_mode() to avoid hitting NULL-pointer
dereferences when making EFI runtime calls.

At the very least, the call to efi_setup_page_tables() should
have existed for kexec before the following commit:

  67a9108ed4 ("x86/efi: Build our own page table structures")

Things just magically worked because we were actually using
the kernel's page tables that contained the required mappings.

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453385519-11477-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-21 21:01:34 +01:00
Matt Fleming
67a9108ed4 x86/efi: Build our own page table structures
With commit e1a58320a3 ("x86/mm: Warn on W^X mappings") all
users booting on 64-bit UEFI machines see the following warning,

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
  x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
  ...
  x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
  ...

This is caused by mapping EFI regions with RWX permissions.
There isn't much we can do to restrict the permissions for these
regions due to the way the firmware toolchains mix code and
data, but we can at least isolate these mappings so that they do
not appear in the regular kernel page tables.

In commit d2f7cbe7b2 ("x86/efi: Runtime services virtual
mapping") we started using 'trampoline_pgd' to map the EFI
regions because there was an existing identity mapping there
which we use during the SetVirtualAddressMap() call and for
broken firmware that accesses those addresses.

But 'trampoline_pgd' shares some PGD entries with
'swapper_pg_dir' and does not provide the isolation we require.
Notably the virtual address for __START_KERNEL_map and
MODULES_START are mapped by the same PGD entry so we need to be
more careful when copying changes over in
efi_sync_low_kernel_mappings().

This patch doesn't go the full mile, we still want to share some
PGD entries with 'swapper_pg_dir'. Having completely separate
page tables brings its own issues such as synchronising new
mappings after memory hotplug and module loading. Sharing also
keeps memory usage down.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29 09:15:42 +01:00
Ard Biesheuvel
44511fb9e5 efi: Use correct type for struct efi_memory_map::phys_map
We have been getting away with using a void* for the physical
address of the UEFI memory map, since, even on 32-bit platforms
with 64-bit physical addresses, no truncation takes place if the
memory map has been allocated by the firmware (which only uses
1:1 virtually addressable memory), which is usually the case.

However, commit:

  0f96a99dab ("efi: Add "efi_fake_mem" boot option")

adds code that clones and modifies the UEFI memory map, and the
clone may live above 4 GB on 32-bit platforms.

This means our use of void* for struct efi_memory_map::phys_map has
graduated from 'incorrect but working' to 'incorrect and
broken', and we need to fix it.

So redefine struct efi_memory_map::phys_map as phys_addr_t, and
get rid of a bunch of casts that are now unneeded.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: izumi.taku@jp.fujitsu.com
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-efi@vger.kernel.org
Cc: matt.fleming@intel.com
Link: http://lkml.kernel.org/r/1445593697-1342-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-28 12:28:06 +01:00
Ingo Molnar
790a2ee242 * Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway - Paul Gortmaker
 
  * Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64 - Leif Lindholm
 
  * Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses - Matt Fleming
 
  * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel
 
  * Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module - Ben Hutchings
 
  * Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support - Taku Izumi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
 +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
 H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
 ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
 zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
 v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
 dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
 v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
 8xb/rET4JrhCG4gFMUZ7
 =1Oee
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull v4.4 EFI updates from Matt Fleming:

  - Make the EFI System Resource Table (ESRT) driver explicitly
    non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway. (Paul Gortmaker)

  - Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64. (Leif Lindholm)

  - Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses. (Matt Fleming)

  - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel

  - Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module. (Ben Hutchings)

  - Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support. (Taku Izumi)

Note: there is a semantic conflict between the following two commits:

  8a53554e12 ("x86/efi: Fix multiple GOP device support")
  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:51:34 +02:00
Ingo Molnar
c7d77a7980 Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:05:18 +02:00
Taku Izumi
0bbea1ce98 x86/efi: Rename print_efi_memmap() to efi_print_memmap()
This patch renames print_efi_memmap() to efi_print_memmap() and
make it global function so that we can invoke it outside of
arch/x86/platform/efi/efi.c

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:08 +01:00
Leif Lindholm
12dd00e83f efi/x86: Move efi=debug option parsing to core
fed6cefe3b ("x86/efi: Add a "debug" option to the efi= cmdline")
adds the DBG flag, but does so for x86 only. Move this early param
parsing to core code.

Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:05 +01:00
Matt Fleming
a5caa209ba x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
that signals that the firmware PE/COFF loader supports splitting
code and data sections of PE/COFF images into separate EFI
memory map entries. This allows the kernel to map those regions
with strict memory protections, e.g. EFI_MEMORY_RO for code,
EFI_MEMORY_XP for data, etc.

Unfortunately, an unwritten requirement of this new feature is
that the regions need to be mapped with the same offsets
relative to each other as observed in the EFI memory map. If
this is not done crashes like this may occur,

  BUG: unable to handle kernel paging request at fffffffefe6086dd
  IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
  Call Trace:
   [<ffffffff8104c90e>] efi_call+0x7e/0x100
   [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
   [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
   [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
   [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
   [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef

Here 0xfffffffefe6086dd refers to an address the firmware
expects to be mapped but which the OS never claimed was mapped.
The issue is that included in these regions are relative
addresses to other regions which were emitted by the firmware
toolchain before the "splitting" of sections occurred at
runtime.

Needless to say, we don't satisfy this unwritten requirement on
x86_64 and instead map the EFI memory map entries in reverse
order. The above crash is almost certainly triggerable with any
kernel newer than v3.13 because that's when we rewrote the EFI
runtime region mapping code, in commit d2f7cbe7b2 ("x86/efi:
Runtime services virtual mapping"). For kernel versions before
v3.13 things may work by pure luck depending on the
fragmentation of the kernel virtual address space at the time we
map the EFI regions.

Instead of mapping the EFI memory map entries in reverse order,
where entry N has a higher virtual address than entry N+1, map
them in the same order as they appear in the EFI memory map to
preserve this relative offset between regions.

This patch has been kept as small as possible with the intention
that it should be applied aggressively to stable and
distribution kernels. It is very much a bugfix rather than
support for a new feature, since when EFI_PROPERTIES_TABLE is
enabled we must map things as outlined above to even boot - we
have no way of asking the firmware not to split the code/data
regions.

In fact, this patch doesn't even make use of the more strict
memory protections available in UEFI v2.5. That will come later.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chun-Yi <jlee@suse.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-01 12:51:28 +02:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Jonathan (Zhixiong) Zhang
7bf793115d efi, x86: Rearrange efi_mem_attributes()
x86 and ia64 implement efi_mem_attributes() differently. This
function needs to be available for other architectures
(such as arm64) as well, such as for the purpose of ACPI/APEI.

ia64 EFI does not set up a 'memmap' variable and does not set
the EFI_MEMMAP flag, so it needs to have its unique implementation
of efi_mem_attributes().

Move efi_mem_attributes() implementation from x86 to the core
EFI code, and declare it with __weak.

It is recommended that other architectures should not override
the default implementation.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438936621-5215-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-08 10:37:39 +02:00
Ricardo Neri
9115c7589b efi: Check for NULL efi kernel parameters
Even though it is documented how to specifiy efi parameters, it is
possible to cause a kernel panic due to a dereference of a NULL pointer when
parsing such parameters if "efi" alone is given:

PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
[ 0.000000]  ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
[ 0.000000]  0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
[ 0.000000]  0000000000000069 000000000000005f 0000000000000000 0000000000000000
[ 0.000000] Call Trace:
[ 0.000000]  [<ffffffff8184bb0f>] dump_stack+0x45/0x57
[ 0.000000]  [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
[ 0.000000]  [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
[ 0.000000]  [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
[ 0.000000]  [<ffffffff81f376e1>] do_early_param+0x50/0x8a
[ 0.000000]  [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
[ 0.000000]  [<ffffffff81f37a43>] parse_early_options+0x24/0x28
[ 0.000000]  [<ffffffff81f37691>] ? loglevel+0x31/0x31
[ 0.000000]  [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
[ 0.000000]  [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
[ 0.000000]  [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
[ 0.000000]  [<ffffffff81f37b20>] start_kernel+0x90/0x423
[ 0.000000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
[ 0.000000] RIP 0xffffffff81ba2efc

This panic is not reproducible with "efi=" as this will result in a non-NULL
zero-length string.

Thus, verify that the pointer to the parameter string is not NULL. This is
consistent with other parameter-parsing functions which check for NULL pointers.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-07-30 18:07:11 +01:00
Linus Torvalds
88793e5c77 The libnvdimm sub-system introduces, in addition to the libnvdimm-core,
4 drivers / enabling modules:
 
 NFIT:
 Instantiates an "nvdimm bus" with the core and registers memory devices
 (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface
 table).  After registering NVDIMMs the NFIT driver then registers
 "region" devices.  A libnvdimm-region defines an access mode and the
 boundaries of persistent memory media.  A region may span multiple
 NVDIMMs that are interleaved by the hardware memory controller.  In
 turn, a libnvdimm-region can be carved into a "namespace" device and
 bound to the PMEM or BLK driver which will attach a Linux block device
 (disk) interface to the memory.
 
 PMEM:
 Initially merged in v4.1 this driver for contiguous spans of persistent
 memory address ranges is re-worked to drive PMEM-namespaces emitted by
 the libnvdimm-core.  In this update the PMEM driver, on x86, gains the
 ability to assert that writes to persistent memory have been flushed all
 the way through the caches and buffers in the platform to persistent
 media.  See memcpy_to_pmem() and wmb_pmem().
 
 BLK:
 This new driver enables access to persistent memory media through "Block
 Data Windows" as defined by the NFIT.  The primary difference of this
 driver to PMEM is that only a small window of persistent memory is
 mapped into system address space at any given point in time.  Per-NVDIMM
 windows are reprogrammed at run time, per-I/O, to access different
 portions of the media.  BLK-mode, by definition, does not support DAX.
 
 BTT:
 This is a library, optionally consumed by either PMEM or BLK, that
 converts a byte-accessible namespace into a disk with atomic sector
 update semantics (prevents sector tearing on crash or power loss).  The
 sinister aspect of sector tearing is that most applications do not know
 they have a atomic sector dependency.  At least today's disk's rarely
 ever tear sectors and if they do one almost certainly gets a CRC error
 on access.  NVDIMMs will always tear and always silently.  Until an
 application is audited to be robust in the presence of sector-tearing
 the usage of BTT is recommended.
 
 Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
 Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
 Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
 Wysocki, and Bob Moore.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj
 3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3
 QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE
 LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv
 +tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq
 KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT
 h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG
 b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0
 UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF
 etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee
 fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo
 7PJBbN5M5qP5tD0aO7SZ
 =VtWG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm

Pull libnvdimm subsystem from Dan Williams:
 "The libnvdimm sub-system introduces, in addition to the
  libnvdimm-core, 4 drivers / enabling modules:

  NFIT:
    Instantiates an "nvdimm bus" with the core and registers memory
    devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
    Interface table).

    After registering NVDIMMs the NFIT driver then registers "region"
    devices.  A libnvdimm-region defines an access mode and the
    boundaries of persistent memory media.  A region may span multiple
    NVDIMMs that are interleaved by the hardware memory controller.  In
    turn, a libnvdimm-region can be carved into a "namespace" device and
    bound to the PMEM or BLK driver which will attach a Linux block
    device (disk) interface to the memory.

  PMEM:
    Initially merged in v4.1 this driver for contiguous spans of
    persistent memory address ranges is re-worked to drive
    PMEM-namespaces emitted by the libnvdimm-core.

    In this update the PMEM driver, on x86, gains the ability to assert
    that writes to persistent memory have been flushed all the way
    through the caches and buffers in the platform to persistent media.
    See memcpy_to_pmem() and wmb_pmem().

  BLK:
    This new driver enables access to persistent memory media through
    "Block Data Windows" as defined by the NFIT.  The primary difference
    of this driver to PMEM is that only a small window of persistent
    memory is mapped into system address space at any given point in
    time.

    Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
    different portions of the media.  BLK-mode, by definition, does not
    support DAX.

  BTT:
    This is a library, optionally consumed by either PMEM or BLK, that
    converts a byte-accessible namespace into a disk with atomic sector
    update semantics (prevents sector tearing on crash or power loss).

    The sinister aspect of sector tearing is that most applications do
    not know they have a atomic sector dependency.  At least today's
    disk's rarely ever tear sectors and if they do one almost certainly
    gets a CRC error on access.  NVDIMMs will always tear and always
    silently.  Until an application is audited to be robust in the
    presence of sector-tearing the usage of BTT is recommended.

  Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
  Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
  Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
  Wysocki, and Bob Moore"

* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
  arch, x86: pmem api for ensuring durability of persistent memory updates
  libnvdimm: Add sysfs numa_node to NVDIMM devices
  libnvdimm: Set numa_node to NVDIMM devices
  acpi: Add acpi_map_pxm_to_online_node()
  libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
  pmem: flag pmem block devices as non-rotational
  libnvdimm: enable iostat
  pmem: make_request cleanups
  libnvdimm, pmem: fix up max_hw_sectors
  libnvdimm, blk: add support for blk integrity
  libnvdimm, btt: add support for blk integrity
  fs/block_dev.c: skip rw_page if bdev has integrity
  libnvdimm: Non-Volatile Devices
  tools/testing/nvdimm: libnvdimm unit test infrastructure
  libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
  nd_btt: atomic sector updates
  libnvdimm: infrastructure for btt devices
  libnvdimm: write blk label set
  libnvdimm: write pmem label set
  libnvdimm: blk labels and namespace instantiation
  ...
2015-06-29 10:34:42 -07:00
Tony Luck
b05b9f5f9d x86, mirror: x86 enabling - find mirrored memory ranges
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory
address ranges.  See UEFI 2.5 spec pages 157-158:

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

On EFI enabled systems scan the memory map and tell memblock about any
mirrored ranges.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Dan Williams
ad5fb870c4 e820, efi: add ACPI 6.0 persistent memory types
ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
Mark it "reserved" and allow it to be claimed by a persistent memory
device driver.

This definition is in addition to the Linux kernel's existing type-12
definition that was recently added in support of shipping platforms with
NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
OEM reserved).

Note, /proc/iomem can be consulted for differentiating legacy
"Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
E820_PMEM.

Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-05-27 21:46:05 -04:00
Peter Jones
0bb549052d efi: Add esrt support
Add sysfs files for the EFI System Resource Table (ESRT) under
/sys/firmware/efi/esrt and for each EFI System Resource Entry under
entries/ as a subdir.

The EFI System Resource Table (ESRT) provides a read-only catalog of
system components for which the system accepts firmware upgrades via
UEFI's "Capsule Update" feature.  This module allows userland utilities
to evaluate what firmware updates can be applied to this system, and
potentially arrange for those updates to occur.

The ESRT is described as part of the UEFI specification, in version 2.5
which should be available from http://uefi.org/specifications in early
2015.  If you're a member of the UEFI Forum, information about its
addition to the standard is available as UEFI Mantis 1090.

For some hardware platforms, additional restrictions may be found at
http://msdn.microsoft.com/en-us/library/windows/hardware/jj128256.aspx ,
and additional documentation may be found at
http://download.microsoft.com/download/5/F/5/5F5D16CD-2530-4289-8019-94C6A20BED3C/windows-uefi-firmware-update-platform.docx
.

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-30 22:15:04 +01:00
Ingo Molnar
744937b0b1 efi: Clean up the efi_call_phys_[prolog|epilog]() save/restore interaction
Currently x86-64 efi_call_phys_prolog() saves into a global variable (save_pgd),
and efi_call_phys_epilog() restores the kernel pagetables from that global
variable.

Change this to a cleaner save/restore pattern where the saving function returns
the saved object and the restore function restores that.

Apply the same concept to the 32-bit code as well.

Plus this approach, as an added bonus, allows us to express the
!efi_enabled(EFI_OLD_MEMMAP) situation in a clean fashion as well,
via a 'NULL' return value.

Cc: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-01 12:46:22 +01:00
Ingo Molnar
23a0d4e8fa efi: Disable interrupts around EFI calls, not in the epilog/prolog calls
Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.

Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.

Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-01 12:46:22 +01:00
Borislav Petkov
fed6cefe3b x86/efi: Add a "debug" option to the efi= cmdline
... and hide the memory regions dump behind it. Make it default-off.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141209095843.GA3990@pd.tnic
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-04-01 12:46:22 +01:00
Mathias Krause
4e78eb0561 x86/efi: Mark initialization code as such
The 32 bit and 64 bit implementations differ in their __init annotations
for some functions referenced from the common EFI code. Namely, the 32
bit variant is missing some of the __init annotations the 64 bit variant
has.

To solve the colliding annotations, mark the corresponding functions in
efi_32.c as initialization code, too -- as it is such.

Actually, quite a few more functions are only used during initialization
and therefore can be marked __init. They are therefore annotated, too.
Also add the __init annotation to the prototypes in the efi.h header so
users of those functions will see it's meant as initialization code
only.

This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might
be more appropriate but this is C code after all, not an opera! :D)

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:03 +01:00
Mathias Krause
0ce4605c9a x86/efi: Update comment regarding required phys mapped EFI services
Commit 3f4a7836e3 ("x86/efi: Rip out phys_efi_get_time()") left
set_virtual_address_map as the only runtime service needed with a
phys mapping but missed to update the preceding comment. Fix that.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:02 +01:00
Mathias Krause
6092068547 x86/efi: Unexport add_efi_memmap variable
This variable was accidentally exported, even though it's only used in
this compilation unit and only during initialization.

Remove the bogus export, make the variable static instead and mark it
as __initdata.

Fixes: 200001eb14 ("x86 boot: only pick up additional EFI memmap...")
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:02 +01:00
Laszlo Ersek
ace1d1218d x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format()
An example log excerpt demonstrating the change:

Before the patch:

> efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
> efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
> efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000400000) (3MB)
> efi: mem03: type=2, attr=0xf, range=[0x0000000000400000-0x0000000000800000) (4MB)
> efi: mem04: type=10, attr=0xf, range=[0x0000000000800000-0x0000000000808000) (0MB)
> efi: mem05: type=7, attr=0xf, range=[0x0000000000808000-0x0000000000810000) (0MB)
> efi: mem06: type=10, attr=0xf, range=[0x0000000000810000-0x0000000000900000) (0MB)
> efi: mem07: type=4, attr=0xf, range=[0x0000000000900000-0x0000000001100000) (8MB)
> efi: mem08: type=7, attr=0xf, range=[0x0000000001100000-0x0000000001400000) (3MB)
> efi: mem09: type=2, attr=0xf, range=[0x0000000001400000-0x0000000002613000) (18MB)
> efi: mem10: type=7, attr=0xf, range=[0x0000000002613000-0x0000000004000000) (25MB)
> efi: mem11: type=4, attr=0xf, range=[0x0000000004000000-0x0000000004020000) (0MB)
> efi: mem12: type=7, attr=0xf, range=[0x0000000004020000-0x00000000068ea000) (40MB)
> efi: mem13: type=2, attr=0xf, range=[0x00000000068ea000-0x00000000068f0000) (0MB)
> efi: mem14: type=3, attr=0xf, range=[0x00000000068f0000-0x0000000006c7b000) (3MB)
> efi: mem15: type=6, attr=0x800000000000000f, range=[0x0000000006c7b000-0x0000000006c7d000) (0MB)
> efi: mem16: type=5, attr=0x800000000000000f, range=[0x0000000006c7d000-0x0000000006c85000) (0MB)
> efi: mem17: type=6, attr=0x800000000000000f, range=[0x0000000006c85000-0x0000000006c87000) (0MB)
> efi: mem18: type=3, attr=0xf, range=[0x0000000006c87000-0x0000000006ca3000) (0MB)
> efi: mem19: type=6, attr=0x800000000000000f, range=[0x0000000006ca3000-0x0000000006ca6000) (0MB)
> efi: mem20: type=10, attr=0xf, range=[0x0000000006ca6000-0x0000000006cc6000) (0MB)
> efi: mem21: type=6, attr=0x800000000000000f, range=[0x0000000006cc6000-0x0000000006d95000) (0MB)
> efi: mem22: type=5, attr=0x800000000000000f, range=[0x0000000006d95000-0x0000000006e22000) (0MB)
> efi: mem23: type=7, attr=0xf, range=[0x0000000006e22000-0x0000000007165000) (3MB)
> efi: mem24: type=4, attr=0xf, range=[0x0000000007165000-0x0000000007d22000) (11MB)
> efi: mem25: type=7, attr=0xf, range=[0x0000000007d22000-0x0000000007d25000) (0MB)
> efi: mem26: type=3, attr=0xf, range=[0x0000000007d25000-0x0000000007ea2000) (1MB)
> efi: mem27: type=5, attr=0x800000000000000f, range=[0x0000000007ea2000-0x0000000007ed2000) (0MB)
> efi: mem28: type=6, attr=0x800000000000000f, range=[0x0000000007ed2000-0x0000000007ef6000) (0MB)
> efi: mem29: type=7, attr=0xf, range=[0x0000000007ef6000-0x0000000007f00000) (0MB)
> efi: mem30: type=9, attr=0xf, range=[0x0000000007f00000-0x0000000007f02000) (0MB)
> efi: mem31: type=10, attr=0xf, range=[0x0000000007f02000-0x0000000007f06000) (0MB)
> efi: mem32: type=4, attr=0xf, range=[0x0000000007f06000-0x0000000007fd0000) (0MB)
> efi: mem33: type=6, attr=0x800000000000000f, range=[0x0000000007fd0000-0x0000000007ff0000) (0MB)
> efi: mem34: type=7, attr=0xf, range=[0x0000000007ff0000-0x0000000008000000) (0MB)

After the patch:

> efi: mem00: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x000000000009f000) (0MB)
> efi: mem01: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x000000000009f000-0x00000000000a0000) (0MB)
> efi: mem02: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000100000-0x0000000000400000) (3MB)
> efi: mem03: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000400000-0x0000000000800000) (4MB)
> efi: mem04: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000800000-0x0000000000808000) (0MB)
> efi: mem05: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000808000-0x0000000000810000) (0MB)
> efi: mem06: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000810000-0x0000000000900000) (0MB)
> efi: mem07: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000900000-0x0000000001100000) (8MB)
> efi: mem08: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000001100000-0x0000000001400000) (3MB)
> efi: mem09: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000001400000-0x0000000002613000) (18MB)
> efi: mem10: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000002613000-0x0000000004000000) (25MB)
> efi: mem11: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000004000000-0x0000000004020000) (0MB)
> efi: mem12: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000004020000-0x00000000068ea000) (40MB)
> efi: mem13: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x00000000068ea000-0x00000000068f0000) (0MB)
> efi: mem14: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x00000000068f0000-0x0000000006c7b000) (3MB)
> efi: mem15: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c7b000-0x0000000006c7d000) (0MB)
> efi: mem16: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c7d000-0x0000000006c85000) (0MB)
> efi: mem17: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c85000-0x0000000006c87000) (0MB)
> efi: mem18: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c87000-0x0000000006ca3000) (0MB)
> efi: mem19: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006ca3000-0x0000000006ca6000) (0MB)
> efi: mem20: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006ca6000-0x0000000006cc6000) (0MB)
> efi: mem21: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006cc6000-0x0000000006d95000) (0MB)
> efi: mem22: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006d95000-0x0000000006e22000) (0MB)
> efi: mem23: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006e22000-0x0000000007165000) (3MB)
> efi: mem24: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007165000-0x0000000007d22000) (11MB)
> efi: mem25: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007d22000-0x0000000007d25000) (0MB)
> efi: mem26: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007d25000-0x0000000007ea2000) (1MB)
> efi: mem27: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ea2000-0x0000000007ed2000) (0MB)
> efi: mem28: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ed2000-0x0000000007ef6000) (0MB)
> efi: mem29: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ef6000-0x0000000007f00000) (0MB)
> efi: mem30: [ACPI Reclaim Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f00000-0x0000000007f02000) (0MB)
> efi: mem31: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f02000-0x0000000007f06000) (0MB)
> efi: mem32: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f06000-0x0000000007fd0000) (0MB)
> efi: mem33: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007fd0000-0x0000000007ff0000) (0MB)
> efi: mem34: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ff0000-0x0000000008000000) (0MB)

Both the type enum and the attribute bitmap are decoded, with the
additional benefit that the memory ranges line up as well.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:41:01 +01:00
Dave Young
a5a750a98f x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode
If enter virtual mode failed due to some reason other than the efi call
the EFI_RUNTIME_SERVICES bit in efi.flags should be cleared thus users
of efi runtime services can check the bit and handle the case instead of
assume efi runtime is ok.

Per Matt, if efi call SetVirtualAddressMap fails we will be not sure
it's safe to make any assumptions about the state of the system. So
kernel panics instead of clears EFI_RUNTIME_SERVICES bit.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:59 +01:00
Dave Young
5ae3683c38 efi: Add kernel param efi=noruntime
noefi kernel param means actually disabling efi runtime, Per suggestion
from Leif Lindholm efi=noruntime should be better. But since noefi is
already used in X86 thus just adding another param efi=noruntime for
same purpose.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:59 +01:00
Dave Young
6ccc72b87b lib: Add a generic cmdline parse function parse_option_str
There should be a generic function to parse params like a=b,c
Adding parse_option_str in lib/cmdline.c which will return true
if there's specified option set in the params.

Also updated efi=old_map parsing code to use the new function

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:58 +01:00
Dave Young
b2e0a54a12 efi: Move noefi early param code out of x86 arch code
noefi param can be used for arches other than X86 later, thus move it
out of x86 platform code.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:58 +01:00
Matt Fleming
5a17dae422 efi: Add efi= parameter parsing to the EFI boot stub
We need a way to customize the behaviour of the EFI boot stub, in
particular, we need a way to disable the "chunking" workaround, used
when reading files from the EFI System Partition.

One of my machines doesn't cope well when reading files in 1MB chunks to
a buffer above the 4GB mark - it appears that the "chunking" bug
workaround triggers another firmware bug. This was only discovered with
commit 4bf7111f50 ("x86/efi: Support initrd loaded above 4G"), and
that commit is perfectly valid. The symptom I observed was a corrupt
initrd rather than any kind of crash.

efi= is now used to specify EFI parameters in two very different
execution environments, the EFI boot stub and during kernel boot.

There is also a slight performance optimization by enabling efi=nochunk,
but that's offset by the fact that you're more likely to run into
firmware issues, at least on x86. This is the rationale behind leaving
the workaround enabled by default.

Also provide some documentation for EFI_READ_CHUNK_SIZE and why we're
using the current value of 1MB.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03 18:40:57 +01:00
Daniel Kiper
f383d00a0d arch/x86: Remove efi_set_rtc_mmss()
efi_set_rtc_mmss() is never used to set RTC due to bugs found
on many EFI platforms. It is set directly by mach_set_rtc_mmss().
Hence, remove unused efi_set_rtc_mmss() function.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:59 +01:00
Daniel Kiper
0fe376e18f arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
Remove redundant set_bit(EFI_MEMMAP, &efi.flags) call.
It is executed earlier in efi_memmap_init().

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:57 +01:00
Daniel Kiper
9f2adfd8a9 arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
Remove redundant set_bit(EFI_SYSTEM_TABLES, &efi.flags) call.
It is executed earlier in efi_systab_init().

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:56 +01:00
Daniel Kiper
9f27bc543b efi: Introduce EFI_PARAVIRT flag
Introduce EFI_PARAVIRT flag. If it is set then kernel runs
on EFI platform but it has not direct control on EFI stuff
like EFI runtime, tables, structures, etc. If not this means
that Linux Kernel has direct access to EFI infrastructure
and everything runs as usual.

This functionality is used in Xen dom0 because hypervisor
has full control on EFI stuff and all calls from dom0 to
EFI must be requested via special hypercall which in turn
executes relevant EFI code in behalf of dom0.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:55 +01:00
Daniel Kiper
67a9b9c53c arch/x86: Do not access EFI memory map if it is not available
Do not access EFI memory map if it is not available. At least
Xen dom0 EFI implementation does not have an access to it.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:54 +01:00
Daniel Kiper
abc93f8eb6 efi: Use early_mem*() instead of early_io*()
Use early_mem*() instead of early_io*() because all mapped EFI regions
are memory (usually RAM but they could also be ROM, EPROM, EEPROM, flash,
etc.) not I/O regions. Additionally, I/O family calls do not work correctly
under Xen in our case. early_ioremap() skips the PFN to MFN conversion
when building the PTE. Using it for memory will attempt to map the wrong
machine frame. However, all artificial EFI structures created under Xen
live in dom0 memory and should be mapped/unmapped using early_mem*() family
calls which map domain memory.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18 21:23:54 +01:00
Ard Biesheuvel
022ee6c558 efi/x86: Move UEFI Runtime Services wrappers to generic code
In order for other archs (such as arm64) to be able to reuse the virtual
mode function call wrappers, move them to drivers/firmware/efi/runtime-wrappers.c.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-07 20:12:53 +01:00
Saurabh Tangri
eeb9db09f7 x86/efi: Move all workarounds to a separate file quirks.c
Currently, it's difficult to find all the workarounds that are
applied when running on EFI, because they're littered throughout
various code paths. This change moves all of them into a separate
file with the hope that it will be come the single location for all
our well documented quirks.

Signed-off-by: Saurabh Tangri <saurabh.tangri@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-19 11:14:33 +01:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Dave Young
a3530e8fe9 x86/efi: Do not export efi runtime map in case old map
For ioremapped efi memory aka old_map the virt addresses are not persistant
across kexec reboot. kexec-tools will read the runtime maps from sysfs then
pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause
kexec boot failure.

To address this issue do not export runtime maps in case efi old_map so
userspace can use no efi boot instead.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-02 12:21:59 +01:00
Ricardo Neri
982e239cd2 x86/efi: Implement a __efi_call_virt macro
For i386, all the EFI system runtime services functions return efi_status_t
except efi_reset_system_system. Therefore, not all functions can be covered
by the same macro in case the macro needs to do more than calling the function
(i.e., return a value). The purpose of the __efi_call_virt macro is to be used
when no return value is expected.

For x86_64, this macro would not be needed as all the runtime services return
u64. However, the same code is used for both x86_64 and i386. Thus, the macro
__efi_call_virt is also defined to not break compilation.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:32 +01:00
Matt Fleming
62fa6e69a4 x86/efi: Delete most of the efi_call* macros
We really only need one phys and one virt function call, and then only
one assembly function to make firmware calls.

Since we are not using the C type system anyway, we're not really losing
much by deleting the macros apart from no longer having a check that
we are passing the correct number of parameters. The lack of duplicated
code seems like a worthwhile trade-off.

Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-17 13:26:30 +01:00
Matt Fleming
3f4a7836e3 x86/efi: Rip out phys_efi_get_time()
Dan reported that phys_efi_get_time() is doing kmalloc(..., GFP_KERNEL)
under a spinlock which is very clearly a bug. Since phys_efi_get_time()
has no users let's just delete it instead of trying to fix it.

Note that since there are no users of phys_efi_get_time(), it is not
possible to actually trigger a GFP_KERNEL alloc under the spinlock.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-17 21:54:28 +00:00
Matt Fleming
994448f1af Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo
Conflicts:
	arch/x86/kernel/setup.c
	arch/x86/platform/efi/efi.c
	arch/x86/platform/efi/efi_64.c
2014-03-05 18:15:37 +00:00
Matt Fleming
4fd69331ad Merge remote-tracking branch 'tip/x86/urgent' into efi-for-mingo
Conflicts:
	arch/x86/include/asm/efi.h
2014-03-05 17:31:41 +00:00
Borislav Petkov
a5d90c923b x86/efi: Quirk out SGI UV
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,

 kernel BUG at arch/x86/mm/init_64.c:351!
 invalid opcode: 0000 [#1] SMP
 Call Trace:
  [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
  [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
  [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
  [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
  [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
  [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
  [<ffffffff8153d955>] ? printk+0x72/0x74
  [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
  [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
  [<ffffffff81535530>] ? rest_init+0x74/0x74
  [<ffffffff81535539>] kernel_init+0x9/0xff
  [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81535530>] ? rest_init+0x74/0x74

Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.

Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 23:43:33 +00:00
Matt Fleming
7d453eee36 x86/efi: Wire up CONFIG_EFI_MIXED
Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.

The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.

Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:57 +00:00
Matt Fleming
4f9dbcfc40 x86/efi: Add mixed runtime services support
Setup the runtime services based on whether we're booting in EFI native
mode or not. For non-native mode we need to thunk from 64-bit into
32-bit mode before invoking the EFI runtime services.

Using the runtime services after SetVirtualAddressMap() is slightly more
complicated because we need to ensure that all the addresses we pass to
the firmware are below the 4GB boundary so that they can be addressed
with 32-bit pointers, see efi_setup_page_tables().

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:14 +00:00
Matt Fleming
099240ac11 x86/efi: Delete dead code when checking for non-native
Both efi_free_boot_services() and efi_enter_virtual_mode() are invoked
from init/main.c, but only if the EFI runtime services are available.
This is not the case for non-native boots, e.g. where a 64-bit kernel is
booted with 32-bit EFI firmware.

Delete the dead code.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:24:59 +00:00
Borislav Petkov
fabb37c736 x86/efi: Split efi_enter_virtual_mode
... into a kexec flavor for better code readability and simplicity. The
original one was getting ugly with ifdeffery.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:19 +00:00
Borislav Petkov
b7b898ae0c x86/efi: Make efi virtual runtime map passing more robust
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:

http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com

One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.

Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.

Thanks to Toshi Kani and Matt for the great debugging help.

Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:18 +00:00