linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 15:55:02 +07:00

Author	SHA1	Message	Date
Andrea Arcangeli	a8aed3e075	x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Without this patch any kernel code that reads kernel memory in non present kernel pte/pmds (as set by pageattr.c) will crash. With this kernel code: static struct page crash_page; static unsigned long crash_address; [..] crash_page = alloc_pages(GFP_KERNEL, 9); crash_address = page_address(crash_page); if (set_memory_np((unsigned long)crash_address, 1)) printk("set_memory_np failure\n"); [..] The kernel will crash if inside the "crash tool" one would try to read the memory at the not present address. crash> p crash_address crash_address = $8 = (long unsigned int ) 0xffff88023c000000 crash> rd 0xffff88023c000000 [ lockup* ] The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which is then mistaken as _PAGE_PROTNONE (so pte_present returns true by mistake and the kernel fault then gets confused and loops). With THP the same can happen after we taught pmd_present to check _PAGE_PROTNONE and _PAGE_PSE in commit `027ef6c878` ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). THP has the same problem with _PAGE_GLOBAL as the 4k pages, but it also has a problem with _PAGE_PSE, which must be cleared too. After the patch is applied copy_user correctly returns -EFAULT and doesn't lockup anymore. crash> p crash_address crash_address = $9 = (long unsigned int *) 0xffff88023c000000 crash> rd 0xffff88023c000000 rd: read error: kernel virtual address: ffff88023c000000 type: "64-bit KVADDR" Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-24 13:01:45 +01:00
Andrea Arcangeli	954f857187	Revert "x86, mm: Make spurious_fault check explicitly check explicitly check the PRESENT bit" I got a report for a minor regression introduced by commit `027ef6c878` ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). So the problem is, pageattr creates kernel pagetables (pte and pmds) that breaks pte_present/pmd_present and the patch above exposed this invariant breakage for pmd_present. The same problem already existed for the pte and pte_present and it was fixed by commit `660a293ea9` ("x86, mm: Make spurious_fault check explicitly check the PRESENT bit") (if it wasn't for that commit, it wouldn't even be a regression). That fix avoids the pagefault to use pte_present. I could follow through by stopping using pmd_present/pmd_huge too. However I think it's more robust to fix pageattr and to clear the PSE/GLOBAL bitflags too in addition to the present bitflag. So the kernel page fault can keep using the regular pte_present/pmd_present/pmd_huge. The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be set only in present pmds (to facilitate split_huge_page final tlb flush). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-24 13:01:44 +01:00
Wen Congyang	942670d0dc	x86/mm/numa: Don't check if node is NUMA_NO_NODE If we aren't debugging per_cpu maps, the cpu's node is stored in per_cpu variable numa_node. If `node' is NUMA_NO_NODE, it means the caller wants to clear the cpu's node. So we should also call set_cpu_numa_node() in this case. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-24 13:01:43 +01:00
Linus Torvalds	9e2d59ad58	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...	2013-02-23 18:50:11 -08:00
Linus Torvalds	5ce1a70e2f	Merge branch 'akpm' (more incoming from Andrew) Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...	2013-02-23 17:50:35 -08:00
Tang Chen	01a178a94e	acpi, memory-hotplug: support getting hotplug info from SRAT We now provide an option for users who don't want to specify physical memory address in kernel commandline. /* * For movablemem_map=acpi: * * SRAT: \|_____\| \|_____\| \|_________\| \|_________\| ...... * node id: 0 1 1 2 * hotpluggable: n y y n * movablemem_map: \|_____\| \|_________\| * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. */ So user just specify movablemem_map=acpi, and the kernel will use hotpluggable info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. If all the memory ranges in SRAT is hotpluggable, then no memory can be used by kernel. But before parsing SRAT, memblock has already reserve some memory ranges for other purposes, such as for kernel image, and so on. We cannot prevent kernel from using these memory. So we need to exclude these ranges even if these memory is hotpluggable. Furthermore, there could be several memory ranges in the single node which the kernel resides in. We may skip one range that have memory reserved by memblock, but if the rest of memory is too small, then the kernel will fail to boot. So, make the whole node which the kernel resides in un-hotpluggable. Then the kernel has enough memory to use. NOTE: Using this way will cause NUMA performance down because the whole node will be set as ZONE_MOVABLE, and kernel cannot use memory on it. If users don't want to lose NUMA performance, just don't use it. [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: use strcmp()] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:14 -08:00
Tang Chen	27168d38fa	acpi, memory-hotplug: extend movablemem_map ranges to the end of node When implementing movablemem_map boot option, we introduced an array movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE. Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify the whole node memory range, we need to extend it to the node end so that we can use it to prevent memblock from allocating memory in the ranges user didn't specify. We now implement movablemem_map boot option like this: /* * For movablemem_map=nn[KMG]@ss[KMG]: * * SRAT: \|_____\| \|_____\| \|_________\| \|_________\| ...... * node id: 0 1 1 2 * user specified: \|__\| \|___\| * movablemem_map: \|___\| \|_________\| \|______\| ...... * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. * * NOTE: In this case, SRAT info will be ingored. */ [akpm@linux-foundation.org: clean up code, fix build warning] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:14 -08:00
Tang Chen	e8d1955258	acpi, memory-hotplug: parse SRAT before memblock is ready On linux, the pages used by kernel could not be migrated. As a result, if a memory range is used by kernel, it cannot be hot-removed. So if we want to hot-remove memory, we should prevent kernel from using it. The way now used to prevent this is specify a memory range by movablemem_map boot option and set it as ZONE_MOVABLE. But when the system is booting, memblock will allocate memory, and reserve the memory for kernel. And before we parse SRAT, and know the node memory ranges, memblock is working. And it may allocate memory in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel, and never be freed. So, let's parse SRAT before memblock is called first. And it is early enough. The first call of memblock_find_in_range_node() is in: setup_arch() \|-->setup_real_mode() so, this patch add a function early_parse_srat() to parse SRAT, and call it before setup_real_mode() is called. NOTE: 1) early_parse_srat() is called before numa_init(), and has initialized numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO NOT zero numa_meminfo in numa_init(), otherwise we will lose memory numa info. 2) I don't know why using count of memory affinities parsed from SRAT as a return value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to remember this count and use it as the return value of the new acpi_numa_init() [mhocko@suse.cz: parse SRAT before memblock is ready fix] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:14 -08:00
Yasuaki Ishimatsu	4d59a75125	x86: get pg_data_t's memory from other node During the implementation of SRAT support, we met a problem. In setup_arch(), we have the following call series: 1) memblock is ready; 2) some functions use memblock to allocate memory; 3) parse ACPI tables, such as SRAT. Before 3), we don't know which memory is hotpluggable, and as a result, we cannot prevent memblock from allocating hotpluggable memory. So, in 2), there could be some hotpluggable memory allocated by memblock. Now, we are trying to parse SRAT earlier, before memblock is ready. But I think we need more investigation on this topic. So in this v5, I dropped all the SRAT support, and v5 is just the same as v3, and it is based on 3.8-rc3. As we planned, we will support getting info from SRAT without users' participation at last. And we will post another patch-set to do so. And also, I think for now, we can add this boot option as the first step of supporting movable node. Since Linux cannot migrate the direct mapped pages, the only way for now is to limit the whole node containing only movable memory. Using SRAT is one way. But even if we can use SRAT, users still need an interface to enable/disable this functionality if they don't want to loose their NUMA performance. So I think, a user interface is always needed. For now, users can disable this functionality by not specifying the boot option. Later, we will post SRAT support, and add another option value "movablecore_map=acpi" to using SRAT. This patch: If system can create movable node which all memory of the node is allocated as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid() to retry when the first allocation fails. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:14 -08:00
Wen Congyang	e13fe8695c	cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node When the node is offlined, there is no memory/cpu on the node. If a sleep task runs on a cpu of this node, it will be migrated to the cpu on the other node. So we can clear cpu-to-node mapping. [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:13 -08:00
Wen Congyang	c4c6052464	cpu_hotplug: clear apicid to node when the cpu is hotremoved When a cpu is hotpluged, we call acpi_map_cpu2node() in _acpi_map_lsapic() to store the cpu's node and apicid's node. But we don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is hotremoved. If the node is also hotremoved, we will get the following messages: kernel BUG at include/linux/gfp.h:329! invalid opcode: 0000 [#1] SMP Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB RIP: 0010:[<ffffffff811bc3fd>] [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP: 0018:ffff88078a049cf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246 RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0 R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003 FS: 00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650) Call Trace: new_slab+0x30/0x1b0 __slab_alloc+0x358/0x4c0 kmem_cache_alloc_node_trace+0xb4/0x1e0 alloc_fair_sched_group+0xd0/0x1b0 sched_create_group+0x3e/0x110 sched_autogroup_create_attach+0x4d/0x180 sys_setsid+0xd4/0xf0 system_call_fastpath+0x16/0x1b Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44 RIP [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300 RSP <ffff88078a049cf8> ---[ end trace adf84c90f3fea3e5 ]--- The reason is that the cpu's node is not NUMA_NO_NODE, we will call alloc_pages_exact_node() to alloc memory on the node, but the node is offlined. If the node is onlined, we still need cpu's node. For example: a task on the cpu is sleeped when the cpu is hotremoved. We will choose another cpu to run this task when it is waked up. If we know the cpu's node, we will choose the cpu on the same node first. So we should clear cpu-to-node mapping when the node is offlined. This patch only clears apicid-to-node mapping when the cpu is hotremoved. [akpm@linux-foundation.org: fix section error] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:13 -08:00
Tang Chen	0197518cd3	memory-hotplug: remove memmap of sparse-vmemmap Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Tang Chen	bbcab8789d	memory-hotplug: remove page table of x86_64 architecture Search a page table about the removed memory, and clear page table for x86_64 architecture. [akpm@linux-foundation.org: make kernel_physical_mapping_remove() static] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Wen Congyang	ae9aae9eda	memory-hotplug: common APIs to support page tables hot-remove When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu	46723bfa54	memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Wen Congyang	24d335ca36	memory-hotplug: introduce new arch_remove_memory() for removing page table For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Linus Torvalds	c47f39e3b7	Merge branch 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading update from Peter Anvin: "This patchset lets us update the CPU microcode very, very early in initialization if the BIOS fails to do so (never happens, right?) This is handy for dealing with things like the Atom erratum where we have to run without PSE because microcode loading happens too late. As I mentioned in the x86/mm push request it depends on that infrastructure but it is otherwise a standalone feature." * 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Make early microcode loading a configuration feature x86/mm/init.c: Copy ucode from initrd image to kernel memory x86/head64.c: Early update ucode in 64-bit x86/head_32.S: Early update ucode in 32-bit x86/microcode_intel_early.c: Early update ucode on Intel's CPU x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() x86/microcode_intel_lib.c: Early update ucode on Intel's CPU x86/microcode_core_early.c: Define interfaces for early loading ucode x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP x86/common.c: Make have_cpuid_p() a global function x86/microcode_intel.h: Define functions and macros for early loading ucode x86, doc: Documentation for early microcode loading	2013-02-22 19:22:52 -08:00
Konrad Rzeszutek Wilk	0cc9129d75	x86-64, xen, mmu: Provide an early version of write_cr3. With commit `8170e6bed4` ("x86, 64bit: Use a #PF handler to materialize early mappings on demand") we started hitting an early bootup crash where the Xen hypervisor would inform us that: (XEN) d7:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffea000005b2d0: (XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]---- .. that Xen was unable to context switch back to dom0. Looking at the calling stack we find: [<ffffffff8103feba>] xen_get_user_pgd+0x5a <-- [<ffffffff8103feba>] xen_get_user_pgd+0x5a [<ffffffff81042d27>] xen_write_cr3+0x77 [<ffffffff81ad2d21>] init_mem_mapping+0x1f9 [<ffffffff81ac293f>] setup_arch+0x742 [<ffffffff81666d71>] printk+0x48 We are trying to figure out whether we need to up-date the user PGD as well. Please keep in mind that under 64-bit PV guests we have a limited amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel and user-space. As such the Linux pvops'fied version of write_cr3 checks if it has to update the user-space cr3 as well. That clearly is not needed during early bootup. The recent changes (see above git commit) streamline the x86 page table allocation to be much simpler (And also incidentally the #PF handler ends up in spirit being similar to how the Xen toolstack sets up the initial page-tables). The fix is to have an early-bootup version of cr3 that just loads the kernel %cr3. The later version - which also handles user-page modifications will be used after the initial page tables have been setup. [ hpa: removed a redundant #ifdef and made the new function __init. Also note that x86-32 already has such an early xen_write_cr3. ] Tested-by: "H. Peter Anvin" <hpa@zytor.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-22 17:41:22 -08:00
Linus Torvalds	ac630dd98a	x86-64: don't set the early IDT to point directly to 'early_idt_handler' The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note the 's') that make sure to set up the error vector number. This is true regardless of whether CONFIG_EARLY_PRINTK is set or not. Why? The stack offset for the comparison of __KERNEL_CS won't be right otherwise, nor will the new check (from commit `8170e6bed4`: "x86, 64bit: Use a #PF handler to materialize early mappings on demand") for the page fault exception vector. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-22 13:09:51 -08:00
Linus Torvalds	2ef14f465b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...	2013-02-21 18:06:55 -08:00
Linus Torvalds	cb715a8366	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Peter Anvin: "This is a corrected attempt at the x86/cpu branch, this time with the fixes in that makes it not break on KVM (current or past), or any other virtualizer which traps on this configuration. Again, the biggest change here is enabling the WC+ memory type on AMD processors, if the BIOS doesn't." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs x86, cpu, amd: Fix WC+ workaround for older virtual hosts x86, AMD: Enable WC+ memory type on family 10 processors x86, AMD: Clean up init_amd() x86/process: Change %8s to %s for pr_warn() in release_thread() x86/cpu/hotplug: Remove CONFIG_EXPERIMENTAL dependency	2013-02-21 18:03:39 -08:00
Linus Torvalds	9afa3195b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...	2013-02-21 17:40:58 -08:00
Linus Torvalds	21eaab6d19	tty/serial patches for 3.9-rc1 Here's the big tty/serial driver patches for 3.9-rc1. More tty port rework and fixes from Jiri here, as well as lots of individual serial driver updates and fixes. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmZYQACgkQMUfUDdst+ylJDgCg0B0nMevUUdM4hLvxunbbiyXM HUEAoIOedqriNNPvX4Bwy0hjeOEaWx0g =vi6x -----END PGP SIGNATURE----- Merge tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial patches from Greg Kroah-Hartman: "Here's the big tty/serial driver patches for 3.9-rc1. More tty port rework and fixes from Jiri here, as well as lots of individual serial driver updates and fixes. All of these have been in the linux-next tree for a while." * tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits) tty: mxser: improve error handling in mxser_probe() and mxser_module_init() serial: imx: fix uninitialized variable warning serial: tegra: assume CONFIG_OF TTY: do not update atime/mtime on read/write lguest: select CONFIG_TTY to build properly. ARM defconfigs: add missing inclusions of linux/platform_device.h fb/exynos: include platform_device.h ARM: sa1100/assabet: include platform_device.h directly serial: imx: Fix recursive locking bug pps: Fix build breakage from decoupling pps from tty tty: Remove ancient hardpps() pps: Additional cleanups in uart_handle_dcd_change pps: Move timestamp read into PPS code proper pps: Don't crash the machine when exiting will do pps: Fix a use-after free bug when unregistering a source. pps: Use pps_lookup_dev to reduce ldisc coupling pps: Add pps_lookup_dev() function tty: serial: uartlite: Support uartlite on big and little endian systems tty: serial: uartlite: Fix sparse and checkpatch warnings serial/arc-uart: Miscll DT related updates (Grant's review comments) ... Fix up trivial conflicts, mostly just due to the TTY config option clashing with the EXPERIMENTAL removal.	2013-02-21 13:41:04 -08:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Linus Torvalds	a0b1c42951	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking update from David Miller: 1) Checkpoint/restarted TCP sockets now can properly propagate the TCP timestamp offset. From Andrey Vagin. 2) VMWARE VM VSOCK layer, from Andy King. 3) Much improved support for virtual functions and SR-IOV in bnx2x, from Ariel ELior. 4) All protocols on ipv4 and ipv6 are now network namespace aware, and all the compatability checks for initial-namespace-only protocols is removed. Thanks to Tom Parkin for helping deal with the last major holdout, L2TP. 5) IPV6 support in netpoll and network namespace support in pktgen, from Cong Wang. 6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration Protocol (MVRP) support, from David Ward. 7) Compute packet lengths more accurately in the packet scheduler, from Eric Dumazet. 8) Use per-task page fragment allocator in skb_append_datato_frags(), also from Eric Dumazet. 9) Add support for connection tracking labels in netfilter, from Florian Westphal. 10) Fix default multicast group joining on ipv6, and add anti-spoofing checks to 6to4 and 6rd. From Hannes Frederic Sowa. 11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern times, rearrange inet frag datastructures for better cacheline locality, and move more operations outside of locking. From Jesper Dangaard Brouer. 12) Instead of strict master <--> slave relationships, allow arbitrary scenerios with "upper device lists". From Jiri Pirko. 13) Improve rate limiting accuracy in TBF and act_police, also from Jiri Pirko. 14) Add a BPF filter netfilter match target, from Willem de Bruijn. 15) Orphan and delete a bunch of pre-historic networking drivers from Paul Gortmaker. 16) Add TSO support for GRE tunnels, from Pravin B SHelar. Although this still needs some minor bug fixing before it's %100 correct in all cases. 17) Handle unresolved IPSEC states like ARP, with a resolution packet queue. From Steffen Klassert. 18) Remove TCP Appropriate Byte Count support (ABC), from Stephen Hemminger. This was long overdue. 19) Support SO_REUSEPORT, from Tom Herbert. 20) Allow locking a socket BPF filter, so that it cannot change after a process drops capabilities. 21) Add VLAN filtering to bridge, from Vlad Yasevich. 22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in the ipv6 routes, from YOSHIFUJI Hideaki. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits) ipv6: fix race condition regarding dst->expires and dst->from. net: fix a wrong assignment in skb_split() ip_gre: remove an extra dst_release() ppp: set qdisc_tx_busylock to avoid LOCKDEP splat atl1c: restore buffer state net: fix a build failure when !CONFIG_PROC_FS net: ipv4: fix waring -Wunused-variable net: proc: fix build failed when procfs is not configured Revert "xen: netback: remove redundant xenvif_put" net: move procfs code to net/core/net-procfs.c qmi_wwan, cdc-ether: add ADU960S bonding: set sysfs device_type to 'bond' bonding: fix bond_release_all inconsistencies b44: use netdev_alloc_skb_ip_align() xen: netback: remove redundant xenvif_put net: fec: Do a sanity check on the gpio number ip_gre: propogate target device GSO capability to the tunnel device ip_gre: allow CSUM capable devices to handle packets bonding: Fix initialize after use for 3ad machine state spinlock bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate() ...	2013-02-20 18:58:50 -08:00
Marcelo Tosatti	6b73a96065	Revert "KVM: MMU: lazily drop large spte" This reverts commit `caf6900f2d`. It is causing migration failures, reference https://bugzilla.kernel.org/show_bug.cgi?id=54061. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-20 18:52:02 -03:00
Matt Fleming	fb834c7acc	x86, efi: Make "noefi" really disable EFI runtime serivces commit `1de63d60cd` ("efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter") attempted to make "noefi" true to its documentation and disable EFI runtime services to prevent the bricking bug described in commit `e0094244e4` ("samsung-laptop: Disable on EFI hardware"). However, it's not possible to clear EFI_RUNTIME_SERVICES from an early param function because EFI_RUNTIME_SERVICES is set in efi_init() after parse_early_param(). This resulted in "noefi" effectively becoming a no-op and no longer providing users with a way to disable EFI, which is bad for those users that have buggy machines. Reported-by: Walt Nelson Jr <walt0924@gmail.com> Cc: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1361392572-25657-1-git-send-email-matt@console-pimps.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-20 13:18:36 -08:00
Linus Torvalds	8793422fd9	ACPI and power management updates for 3.9-rc1 - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J. Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K 9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77 s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk 5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/ hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1 8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0 5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS E2oZDuyewTJxaskzYsNr =wijn -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. * tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E ACPI / hotplug: Fix concurrency issues and memory leaks ...	2013-02-20 11:26:56 -08:00
Ian Campbell	c81611c4e9	xen: event channel arrays are xen_ulong_t and not unsigned long On ARM we want these to be the same size on 32- and 64-bit. This is an ABI change on ARM. X86 does not change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Keir (Xen.org) <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: linux-arm-kernel@lists.infradead.org Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-20 08:45:07 -05:00
Ingo Molnar	ff1fb5f6b4	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull two fixes from Steven Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:26:21 +01:00
Mathias Krause	27cf929845	x86/apic: Fix parsing of the 'lapic' cmdline option Including " lapic " in the kernel cmdline on an x86-64 kernel makes it panic while parsing early params -- e.g. with no user visible output. Fix this bug by ensuring arg is non-NULL before passing it to strncmp(). Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1361303227-13174-1-git-send-email-minipli@googlemail.com Cc: stable@vger.kernel.org # v3.8 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:24:36 +01:00
Stephane Eranian	69943182bb	perf/x86: Add Intel IvyBridge event scheduling constraints Intel IvyBridge processor has different constraints compared to SandyBridge. Therefore it needs its own contraint table. This patch adds the constraint table. Without this patch, the events listed in the patch may not be scheduled correctly and bogus counts may be collected. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: namhyung.kim@lge.com Link: http://lkml.kernel.org/r/1361355312-3323-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:22:46 +01:00
Linus Torvalds	1eaec8212e	Merge branch 'for-3.9-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue [delayed_]work_pending() cleanups from Tejun Heo: "This is part of on-going cleanups to remove / minimize usages of workqueue interfaces which are deprecated and/or misleading. This round drops a number of usages of [delayed_]work_pending(), which are dangerous as they lack any form of synchronization and thus often lead to buggy / unnecessary code. There are a couple legitimate use cases in kernel. Hopefully, they can be converted and [delayed_]work_pending() can be removed completely. Even if not, removing most of misuses should make it more difficult to find examples of misuses and thus slow down growth of them. These changes are independent from other workqueue changes." * 'for-3.9-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: wimax/i2400m: fix i2400m->wake_tx_skb handling kprobes: fix wait_for_kprobe_optimizer() ipw2x00: simplify scan_event handling video/exynos: don't use [delayed_]work_pending() tty/max3100: don't use [delayed_]work_pending() x86/mce: don't use [delayed_]work_pending() rfkill: don't use [delayed_]work_pending() wl1251: don't use [delayed_]work_pending() thinkpad_acpi: don't use [delayed_]work_pending() mwifiex: don't use [delayed_]work_pending() sja1000: don't use [delayed_]work_pending()	2013-02-19 21:58:52 -08:00
Linus Torvalds	1a13c0b181	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV3 support update from Ingo Molnar: "Support for the SGI Ultraviolet System 3 (UV3) platform - the upcoming third major iteration and upscaling of the SGI UV supercomputing platform." * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3 x86, uv, uv3: Check current gru hub support for SGI UV3 x86, uv, uv3: Update Time Support for SGI UV3 x86, uv, uv3: Update x2apic Support for SGI UV3 x86, uv, uv3: Update Hub Info for SGI UV3 x86, uv, uv3: Update ACPI Check to include SGI UV3 x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3)	2013-02-19 20:12:57 -08:00
Linus Torvalds	f98982ce80	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: - Support for the Technologic Systems TS-5500 platform, by Vivien Didelot - Improved NUMA support on AMD systems: Add support for federated systems where multiple memory controllers can exist and see each other over multiple PCI domains. This basically means that AMD node ids can be more than 8 now and the code handling this is taught to incorporate PCI domain into those IDs. - Support for the Goldfish virtual Android emulator, by Jun Nakajima, Intel, Google, et al. - Misc fixlets. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add TS-5500 platform support x86/srat: Simplify memory affinity init error handling x86/apb/timer: Remove unnecessary "if" goldfish: platform device for x86 amd64_edac: Fix type usage in NB IDs and memory ranges amd64_edac: Fix PCI function lookup x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id x86, AMD, NB: Add multi-domain support	2013-02-19 20:11:07 -08:00
Linus Torvalds	29d5052329	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/hyperv changes from Ingo Molnar: "The biggest change is support for Windows 8's improved hypervisor interrupt model on the Linux Hyper-V guest subsystem code side. Smallish fixes otherwise." * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: HYPERV depends on X86_LOCAL_APIC X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts X86: Add a check to catch Xen emulation of Hyper-V x86: Hyper-V: register clocksource only if its advertised	2013-02-19 20:10:21 -08:00
Linus Torvalds	026f149ca3	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/debug changes from Ingo Molnar: "Two init annotations and a built-in memtest speedup" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/memtest: Shorten time for tests x86: Convert a few mistaken __cpuinit annotations to __init x86/EFI: Properly init-annotate BGRT code	2013-02-19 20:09:48 -08:00
Linus Torvalds	11743a1db9	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanup patches from Ingo Molnar: "Misc smaller cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: ptrace.c only needs export.h and not the full module.h x86, apb_timer: remove unused variable percpu_timer um: don't compare a pointer to 0 arch/x86/platform/uv: use ARRAY_SIZE where possible	2013-02-19 19:13:49 -08:00
Linus Torvalds	121027a7a6	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull two x86 kernel build changes from Ingo Molnar: "The first change modifies how 'make oldconfig' works on cross-bitness situations on x86. It was felt the new behavior of preserving the bitness of the .config is more logical. This is a leftover of the merge. The second change eliminates a Perl warning. (There's another, more complete fix resulting of this warning fix, which second fix in flight to you via the kbuild tree, which will remove the timeconst.pl script altogether.)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timeconst.pl: Eliminate Perl warning x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT	2013-02-19 19:12:03 -08:00
Linus Torvalds	5abcd76f5d	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bootup changes from Ingo Molnar: "Deal with bootloaders which fail to initialize unknown fields in boot_params to zero, by sanitizing boot params passed in. This unbreaks versions of kexec-utils. Other bootloaders do not appear to show sensitivity to this change, but it's a possibility for breakage nevertheless." * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, boot: Sanitize boot_params if not zeroed on creation	2013-02-19 19:11:10 -08:00
Linus Torvalds	a57ed93600	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "The biggest change (by line count) is the unification of the XOR code and then the introduction of an additional SSE based XOR assembly method. The other bigger change is the head_32.S rework/cleanup by Borislav Petkov. Last but not least there's the usual laundry list of small but dangerous (and hopefully perfectly tested) changes to subtle low level x86 code, plus cleanups." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, head_32: Give the 6 label a real name x86, head_32: Remove second CPUID detection from default_entry x86: Detect CPUID support early at boot x86, head_32: Remove i386 pieces x86: Require MOVBE feature in cpuid when we use it x86: Enable ARCH_USE_BUILTIN_BSWAP x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line x86/xor: Unify SSE-base xor-block routines x86: Fix a typo x86/mm: Fix the argument passed to sync_global_pgds() x86/mm: Convert update_mmu_cache() and update_mmu_cache_pmd() to functions ix86: Tighten asmlinkage_protect() constraints	2013-02-19 19:09:42 -08:00
Linus Torvalds	5800700f66	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Main changes: - Multiple MSI support added to the APIC, PCI and AHCI code - acked by all relevant maintainers, by Alexander Gordeev. The advantage is that multiple AHCI ports can have multiple MSI irqs assigned, and can thus spread to multiple CPUs. [ Drivers can make use of this new facility via the pci_enable_msi_block_auto() method ] - x86 IOAPIC code from interrupt remapping cleanups from Joerg Roedel: These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. - Various smaller improvements, fixes and cleanups." * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess x86, kvm: Fix intialization warnings in kvm.c x86, irq: Move irq_remapped out of x86 core code x86, io_apic: Introduce eoi_ioapic_pin call-back x86, msi: Introduce x86_msi.compose_msi_msg call-back x86, irq: Introduce setup_remapped_irq() x86, irq: Move irq_remapped() check into free_remapped_irq x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq() x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core x86, irq: Add data structure to keep AMD specific irq remapping information x86, irq: Move irq_remapping_enabled declaration to iommu code x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin x86, io_apic: Move irq_remapping_enabled checks out of check_timer() x86, io_apic: Convert setup_ioapic_entry to function pointer x86, io_apic: Introduce set_affinity function pointer x86, msi: Use IRQ remapping specific setup_msi_irqs routine x86, hpet: Introduce x86_msi_ops.setup_hpet_msi x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging x86, io_apic: Introduce x86_io_apic_ops.disable() x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume ...	2013-02-19 19:07:27 -08:00
Linus Torvalds	266d7ad7f4	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: "Main changes: - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to keep the hardware clock updated. - posix-timers: Fix clock_adjtime to always return timex data on success. This is changing the ABI, but no breakage was expected and found - caution is warranted nevertheless. - platform persistent clock improvements/cleanups. - clockevents: refactor timer broadcast handling to be more generic and less duplicated with matching architecture code (mostly ARM motivated.) - various fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() posix-cpu-timers: Fix nanosleep task_struct leak clockevents: Fix generic broadcast for FEAT_C3STOP time, Fix setting of hardware clock in NTP code hrtimer: Prevent hrtimer_enqueue_reprogram race clockevents: Add generic timer broadcast function clockevents: Add generic timer broadcast receiver timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK x86/time/rtc: Don't print extended CMOS year when reading RTC x86: Select HAS_PERSISTENT_CLOCK on x86 timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option rtc: Skip the suspend/resume handling if persistent clock exist timekeeping: Add persistent_clock_exist flag posix-timers: Fix clock_adjtime to always return timex data on success Round the calculated scale factor in set_cyc2ns_scale() NTP: Add a CONFIG_RTC_SYSTOHC configuration MAINTAINERS: Update John Stultz's email time: create __getnstimeofday for WARNless calls	2013-02-19 19:05:45 -08:00
Stefan Bader	76eaca031f	xen: Send spinlock IPI to all waiters There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 22:03:29 -05:00
Stefano Stabellini	3216dceb31	xen: introduce xen_remap, use it instead of ioremap ioremap can't be used to map ring pages on ARM because it uses device memory caching attributes (MT_DEVICE*). Introduce a Xen specific abstraction to map ring pages, called xen_remap, that is defined as ioremap on x86 (no behavioral changes). On ARM it explicitly calls __arm_ioremap with the right caching attributes: MT_MEMORY. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 22:02:34 -05:00
Konrad Rzeszutek Wilk	dacd45f4e7	xen/smp: Move the common CPU init code a bit to prep for PVH patch. The PV and PVH code CPU init code share some functionality. The PVH code ("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus") sets some of these up, but not all. To make it easier to read, this patch removes the PV specific out of the generic way. No functional change - just code movement. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v2: Fixed compile errors noticed by Fengguang Wu build system] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 21:59:47 -05:00
Linus Torvalds	d652e1eb8e	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Main changes: - scheduler side full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker. - Initial sched.h split-up changes, by Clark Williams - select_idle_sibling() performance improvement by Mike Galbraith: " 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs " - sched_rr_get_interval() ABI fix/change. We think this detail is not used by apps (so it's not an ABI in practice), but lets keep it under observation. - misc RT scheduling cleanups, optimizations" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h> cputime: Remove irqsave from seqlock readers sched, powerpc: Fix sched.h split-up build failure cputime: Restore CPU_ACCOUNTING config defaults for PPC64 sched/rt: Move rt specific bits into new header file sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice sched: Move sched.h sysctl bits into separate header sched: Fix signedness bug in yield_to() sched: Fix select_idle_sibling() bouncing cow syndrome sched/rt: Further simplify pick_rt_task() sched/rt: Do not account zero delta_exec in update_curr_rt() cputime: Safely read cputime of full dynticks CPUs kvm: Prepare to add generic guest entry/exit callbacks cputime: Use accessors to read task cputime stats cputime: Allow dynamic switch between tick/virtual based cputime accounting cputime: Generic on-demand virtual cputime accounting cputime: Move default nsecs_to_cputime() to jiffies based cputime file cputime: Librarize per nsecs resolution cputime definitions cputime: Avoid multiplication overflow on utime scaling context_tracking: Export context state for generic vtime ... Fix up conflict in kernel/context_tracking.c due to comment additions.	2013-02-19 18:19:48 -08:00
Linus Torvalds	8f55cea410	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "There are lots of improvements, the biggest changes are: Main kernel side changes: - Improve uprobes performance by adding 'pre-filtering' support, by Oleg Nesterov. - Make some POWER7 events available in sysfs, equivalent to what was done on x86, from Sukadev Bhattiprolu. - tracing updates by Steve Rostedt - mostly misc fixes and smaller improvements. - Use perf/event tracing to report PCI Express advanced errors, by Tony Luck. - Enable northbridge performance counters on AMD family 15h, by Jacob Shin. - This tracing commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties (PowerTop in particular) seem to agree that it's safe to do now with the introduction of libtraceevent, but the devil is in the details ... Main tooling side changes: - Add 'event group view', from Namyung Kim: To use it, 'perf record' should group events when recording. And then perf report parses the saved group relation from file header and prints them together if --group option is provided. You can use the 'perf evlist' command to see event group information: $ perf record -e '{ref-cycles,cycles}' noploop 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ] $ perf evlist --group {ref-cycles,cycles} With this example, default perf report will show you each event separately. You can use --group option to enable event group view: $ perf report --group ... # group: {ref-cycles,cycles} # ======== # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time As you can see the Overhead column now contains both of ref-cycles and cycles and header line shows group information also - 'anon group { ref-cycles, cycles }'. The output is sorted by period of group leader first. - Initial GTK+ annotate browser, from Namhyung Kim. - Add option for runtime switching perf data file in perf report, just press 's' and a menu with the valid files found in the current directory will be presented, from Feng Tang. - Add support to display whole group data for raw columns, from Jiri Olsa. - Add per processor socket count aggregation in perf stat, from Stephane Eranian. - Add interval printing in 'perf stat', from Stephane Eranian. - 'perf test' improvements - Add support for wildcards in tracepoint system name, from Jiri Olsa. - Add anonymous huge page recognition, from Joshua Zhu. - perf build-id cache now can show DSOs present in a perf.data file that are not in the cache, to integrate with build-id servers being put in place by organizations such as Fedora. - perf top now shares more of the evsel config/creation routines with 'record', paving the way for further integration like 'top' snapshots, etc. - perf top now supports DWARF callchains. - Fix mmap limitations on 32-bit, fix from David Miller. - 'perf bench numa mem' NUMA performance measurement suite - ... and lots of fixes, performance improvements, cleanups and other improvements I failed to list - see the shortlog and git log for details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits) perf/x86/amd: Enable northbridge performance counters on AMD family 15h perf/hwbp: Fix cleanup in case of kzalloc failure perf tools: Fix build with bison 2.3 and older. perf tools: Limit unwind support to x86 archs perf annotate: Make it to be able to skip unannotatable symbols perf gtk/annotate: Fail early if it can't annotate perf gtk/annotate: Show source lines with gray color perf gtk/annotate: Support multiple event annotation perf ui/gtk: Implement basic GTK2 annotation browser perf annotate: Fix warning message on a missing vmlinux perf buildid-cache: Add --update option uprobes/perf: Avoid uprobe_apply() whenever possible uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE uprobes/perf: Teach trace_uprobe/perf code to pre-filter uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's uprobes: Introduce uprobe_apply() perf: Introduce hw_perf_event->tp_target and ->tp_list uprobes/perf: Always increment trace_uprobe->nhit uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe uprobes/tracing: Introduce is_trace_uprobe_enabled() ...	2013-02-19 17:49:41 -08:00
Linus Torvalds	b7133a9a10	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core changes from Ingo Molnar: "The biggest changes are the IRQ-work and printk changes from Frederic Weisbecker, which prepare the code for 'full dynticks' (the ability to stop or slow down the periodic tick arbitrarily, not just in idle time as today): - Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. - Flush irq works before CPU offlining. - Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. - Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. - Cleanups and fixes." * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Export enable/disable_percpu_irq() arch Kconfig: Remove references to IRQ_PER_CPU irq_work: Remove return value from the irq_work_queue() function genirq: Avoid deadlock in spurious handling printk: Wake up klogd using irq_work irq_work: Make self-IPIs optable irq_work: Warn if there's still work on cpu_down irq_work: Flush work on CPU_DYING irq_work: Don't stop the tick with pending works nohz: Add API to check tick state irq_work: Remove CONFIG_HAVE_IRQ_WORK irq_work: Fix racy check on work pending flag irq_work: Fix racy IRQ_WORK_BUSY flag setting	2013-02-19 17:47:58 -08:00
Borislav Petkov	2e32b71906	x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs The "x86, AMD: Enable WC+ memory type on family 10 processors" patch currently in -tip added a workaround for AMD F10h CPUs which #GPs my guest when booted in kvm. This is because it accesses MSR_AMD64_BU_CFG2 which is not currently ignored by kvm. Do that because this MSR is only baremetal-relevant anyway. While at it, move the ignored MSRs at the beginning of kvm_set_msr_common so that we exit then and there. Acked-by: Gleb Natapov <gleb@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andre Przywara <andre@andrep.de> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1361298793-31834-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-19 10:44:07 -08:00
Borislav Petkov	52d3d06e70	x86, cpu, amd: Fix WC+ workaround for older virtual hosts The WC+ workaround for F10h introduces a new MSR and kvm host #GPs on accesses to unknown MSRs if paravirt is not compiled in. Use the exception-handling MSR accessors so as not to break 3.8 and later guests booting on older hosts. Remove a redundant family check while at it. Cc: Gleb Natapov <gleb@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1361298793-31834-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-19 10:44:00 -08:00
David S. Miller	6338a53a2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into net Pull in 'net' to take in the bug fixes that didn't make it into 3.8-final. Also, deal with the semantic conflict of the change made to net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release was added to 'net', but in 'net-next' we no longer cache the neighbour entries in the ipv6 routes so that change is not appropriate there. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-18 23:34:21 -05:00
Marcelo Tosatti	ed55705dd5	x86: pvclock kvm: align allocation size to page size To match whats mapped via vsyscalls to userspace. Reported-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-18 23:05:19 -03:00
Rafael J. Wysocki	10baf04e95	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (35 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E cpuidle: remove vestage definition of cpuidle_state_usage.driver_data x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag x86 idle: remove mwait_idle() and "idle=mwait" cmdline param ... Conflicts: arch/x86/kernel/process.c (with PM / tracing commit `43720bd`) drivers/acpi/processor_idle.c (with ACPICA commit `4f84291`)	2013-02-18 22:34:11 +01:00
Alexander Holler	20bf062c65	x86/memtest: Shorten time for tests By just reversing the order memtest is using the test patterns, an additional round to zero the memory is not necessary. This might save up to a second or even more for setups which are doing tests on every boot. Signed-off-by: Alexander Holler <holler@ahsoftware.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1361029097-8308-1-git-send-email-holler@ahsoftware.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-18 09:28:42 +01:00
Len Brown	ca62cf59ce	Merge branch 'misc' into release Conflicts: arch/x86/kernel/process.c Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-18 00:25:53 -05:00
Len Brown	2e7d0f60d8	Merge branches 'idle-remove-statedata', 'pm_idle' and 'idle-hsw-turbostat' into release	2013-02-18 00:25:16 -05:00
Len Brown	a476bda30b	x86 idle: rename global pm_idle to static x86_idle (pm_idle)() is being removed from linux/pm.h because Linux does not have such a cross-architecture concept. x86 uses an idle function pointer in its architecture specific code as a backup to cpuidle. So we re-name x86 use of pm_idle to x86_idle, and make it static to x86. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2013-02-17 23:34:58 -05:00
Len Brown	dd8af07626	APM idle: register apm_cpu_idle via cpuidle Update APM to register its local idle routine with cpuidle. This allows us to stop exporting pm_idle to modules on x86. The Kconfig sub-option, APM_CPU_IDLE, now depends on on CPU_IDLE. Compile-tested only. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Jiri Kosina <jkosina@suse.cz>	2013-02-17 23:34:46 -05:00
Jacob Shin	e259514eef	perf/x86/amd: Enable northbridge performance counters on AMD family 15h On AMD family 15h processors, there are 4 new performance counters (in addition to 6 core performance counters) that can be used for counting northbridge events (i.e. DRAM accesses). Their bit fields are almost identical to the core performance counters. However, unlike the core performance counters, these MSRs are shared between multiple cores (that share the same northbridge). We will reuse the same code path as existing family 10h northbridge event constraints handler logic to enforce this sharing. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jacob Shin <jacob.shin@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-7-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-16 09:37:27 +01:00
Linus Torvalds	f741656d64	Revert the PVonHVM kexec. The patch introduces a regression with older hypervisor stacks, such as Xen 4.1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRHZ7eAAoJEFjIrFwIi8fJZ+sH/ieMkzdBB6aqbFMcNr7mkfBo i3swjO2JQI7REYIHfKEVoR3IgHfqKEuABdeEQrceE0XqDepFh84YiKGI2QpPRWEA 903vUV4DXVdcBrypbL45tSFZ1Jxsrzx+F7WfV/f9WHyeiwOyaZTGVQH0VuOzpcum RvPTT7MmC7g8MJDi66SDYBaX/pBQzifQ81nMWWjXNw0w4CwWX7le1cScZEP42MR6 jTEHzYMLDojdO+2aQM5pt/0CGI5tzBHtX5nNRl6tovlPI3ckknYYx6a7RfxkfZzF IkMIuGS32yLfsswPPIiMs47/Qgiq3BN6eSTJXMZKUwQokL9yEs8LodcnRDYfgyQ= =fqcJ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes: - A simple bug-fix for redundant NULL check. - CVE-2013-0228/XSA-42: x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS and two reverts: - Revert the PVonHVM kexec. The patch introduces a regression with older hypervisor stacks, such as Xen 4.1." * tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen PVonHVM: use E820_Reserved area for shared_info" Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info" xen: remove redundant NULL check before unregister_and_remove_pcpu(). x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.	2013-02-15 12:12:55 -08:00
H. Peter Anvin	0da3e7f526	Merge branch 'x86/mm2' into x86/mm x86/mm2 is testing out fine, but has developed conflicts with x86/mm due to patches in adjacent code. Merge them so we can drop x86/mm2 and have a unified branch. Resolved Conflicts: arch/x86/kernel/setup.c	2013-02-15 09:25:08 -08:00
Rafael J. Wysocki	7113fe74c1	Merge branch 'pm-assorted' * pm-assorted: suspend: enable freeze timeout configuration through sys ACPI: enable ACPI SCI during suspend PM: Introduce suspend state PM_SUSPEND_FREEZE PM / Runtime: Add new helper function: pm_runtime_active() PM / tracing: remove deprecated power trace API PM: don't use [delayed_]work_pending() PM / Domains: don't use [delayed_]work_pending()	2013-02-15 13:58:54 +01:00
Rafael J. Wysocki	e8f71df723	Merge branch 'acpi-cleanup' * acpi-cleanup: (21 commits) ACPI / hotplug: Fix concurrency issues and memory leaks ACPI: Remove the use of CONFIG_ACPI_CONTAINER_MODULE ACPI / scan: Full transition to D3cold in acpi_device_unregister() ACPI / scan: Make acpi_bus_hot_remove_device() acquire the scan lock ACPI: Drop the container.h header file ACPI / Documentation: refer to correct file for acpi_platform_device_ids[] table ACPI / scan: Make container driver use struct acpi_scan_handler ACPI / scan: Remove useless #ifndef from acpi_eject_store() ACPI: Unbind ACPI drv when probe failed ACPI: sysfs eject support for ACPI scan handlers ACPI / scan: Follow priorities of IDs when matching scan handlers ACPI / PCI: pci_slot: replace printk(KERN_xxx) with pr_xxx() ACPI / dock: Fix acpi_bus_get_device() check in drivers/acpi/dock.c ACPI / scan: Clean up acpi_bus_get_parent() ACPI / platform: Use struct acpi_scan_handler for creating devices ACPI / PCI: Make PCI IRQ link driver use struct acpi_scan_handler ACPI / PCI: Make PCI root driver use struct acpi_scan_handler ACPI / scan: Introduce struct acpi_scan_handler ACPI / scan: Make scanning of fixed devices follow the general scheme ACPI: Drop device start operation that is not used ...	2013-02-15 13:58:30 +01:00
Satoru Takeuchi	36dfbbf136	timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() vread_hpet() uses "0xf0" as the offset of the hpet counter. To clarify the meaning of this code, it should use symbolic name, HPET_COUNTER, instead. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-15 12:13:18 +01:00
Konrad Rzeszutek Wilk	e9daff24a2	Revert "xen PVonHVM: use E820_Reserved area for shared_info" This reverts commit `9d02b43dee`. We are doing this b/c on 32-bit PVonHVM with older hypervisors (Xen 4.1) it ends up bothing up the start_info. This is bad b/c we use it for the time keeping, and the timekeeping code loops forever - as the version field never changes. Olaf says to revert it, so lets do that. Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-14 21:29:31 -05:00
Konrad Rzeszutek Wilk	5eb65be2d9	Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info" This reverts commit `a7be94ac8d`. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-14 21:29:27 -05:00
Randy Dunlap	cb8081cb6b	lguest: select CONFIG_TTY to build properly. Fix kconfig warning for LGUEST_GUEST config by selecting TTY: warning: (KVMTOOL_TEST_ENABLE && LGUEST_GUEST) selects VIRTIO_CONSOLE which has unmet direct dependencies (VIRTIO && TTY) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joe Millenbach <jmillenbach@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-02-14 16:01:55 -08:00
H. Peter Anvin	95c9608478	x86, mm: Move reserving low memory later in initialization Move the reservation of low memory, except for the 4K which actually does belong to the BIOS, later in the initialization; in particular, after we have already reserved the trampoline. The current code locates the trampoline as high as possible, so by deferring the allocation we will still be able to reserve as much memory as is possible. This allows us to run with reservelow=640k without getting a crash on system startup. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-0y9dqmmsousf69wutxwl3kkf@git.kernel.org	2013-02-14 15:21:25 -08:00
Paul Gortmaker	19348e749e	x86: ptrace.c only needs export.h and not the full module.h Commit `cb57a2b4cf` ("x86-32: Export kernel_stack_pointer() for modules") added an include of the module.h header in conjunction with adding an EXPORT_SYMBOL_GPL of kernel_stack_pointer. But module.h should be avoided for simple exports, since it in turn includes the world. Swap the module.h for export.h instead. Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1360872842-28417-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-14 12:56:12 -08:00
Al Viro	235b80226b	x86: convert to ksignal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-14 09:21:17 -05:00
Al Viro	d64008a8f3	burying unused conditionals __ARCH_WANT_SYS_RT_SIGACTION, __ARCH_WANT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} - can be assumed always set.	2013-02-14 09:21:15 -05:00
Satoru Takeuchi	6b59e366e0	x86, efi: remove duplicate code in setup_arch() by using, efi_is_native() The check, "IS_ENABLED(CONFIG_X86_64) != efi_enabled(EFI_64BIT)", in setup_arch() can be replaced by efi_is_enabled(). This change remove duplicate code and improve readability. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Olof Johansson <olof@lixom.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-02-14 10:36:18 +00:00
Jan Kiszka	cbd29cb6e3	KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr We already pass vmcs12 as argument. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-02-14 10:35:16 +02:00
Satoru Takeuchi	1de63d60cd	efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter There was a serious problem in samsung-laptop that its platform driver is designed to run under BIOS and running under EFI can cause the machine to become bricked or can cause Machine Check Exceptions. Discussion about this problem: https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 https://bugzilla.kernel.org/show_bug.cgi?id=47121 The patches to fix this problem: efi: Make 'efi_enabled' a function to query EFI facilities `83e6818974` samsung-laptop: Disable on EFI hardware `e0094244e4` Unfortunately this problem comes back again if users specify "noefi" option. This parameter clears EFI_BOOT and that driver continues to run even if running under EFI. Refer to the document, this parameter should clear EFI_RUNTIME_SERVICES instead. Documentation/kernel-parameters.txt: =============================================================================== ... noefi [X86] Disable EFI runtime services support. ... =============================================================================== Documentation/x86/x86_64/uefi.txt: =============================================================================== ... - If some or all EFI runtime services don't work, you can try following kernel command line parameters to turn off some or all EFI runtime services. noefi turn off all EFI runtime services ... =============================================================================== Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Link: http://lkml.kernel.org/r/511C2C04.2070108@jp.fujitsu.com Cc: Matt Fleming <matt.fleming@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-13 17:24:11 -08:00
Len Brown	1ed51011af	tools/power turbostat: display SMI count by default The SMI counter is popular -- so display it by default rather than requiring an option. What the heck, we've blown the 80 column budget on many systems already... Note that the value displayed is the delta during the measurement interval. The absolute value of the counter can still be seen with the generic 32-bit MSR option, ie. -m 0x34 Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-13 18:22:12 -05:00
Jan Beulich	13d2b4d11d	x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: `b77f8388` ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [<c08476df>] ? panic+0x6e/0x122 [<c084b63c>] ? oops_end+0xbc/0xd0 [<c084b260>] ? do_general_protection+0x0/0x210 [<c084a9b7>] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Petr Matousek <pmatouse@redhat.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-13 15:40:30 -05:00
Gleb Natapov	f583c29b79	x86 emulator: fix parity calculation for AAD instruction Reported-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-02-13 18:01:00 +02:00
Mel Gorman	0ee364eb31	x86/mm: Check if PUD is large when validating a kernel address A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.coM> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-13 10:02:55 +01:00
K. Y. Srinivasan	bc2b0331e0	X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts Starting with win8, vmbus interrupts can be delivered on any VCPU in the guest and furthermore can be concurrently active on multiple VCPUs. Support this interrupt delivery model by setting up a separate IDT entry for Hyper-V vmbus. interrupts. I would like to thank Jan Beulich <JBeulich@suse.com> and Thomas Gleixner <tglx@linutronix.de>, for their help. In this version of the patch, based on the feedback, I have merged the IDT vector for Xen and Hyper-V and made the necessary adjustments. Furhermore, based on Jan's feedback I have added the necessary compilation switches. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1359940959-32168-3-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 16:27:15 -08:00
K. Y. Srinivasan	db34bbb767	X86: Add a check to catch Xen emulation of Hyper-V Xen emulates Hyper-V to host enlightened Windows. Looks like this emulation may be turned on by default even for Linux guests. Check and fail Hyper-V detection if we are on Xen. [ hpa: the problem here is that Xen doesn't emulate Hyper-V well enough, and if the Xen support isn't compiled in, we end up stubling over the Hyper-V emulation and try to activate it -- and it fails. ] Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1359940959-32168-2-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 16:27:03 -08:00
Olaf Hering	32068f6527	x86: Hyper-V: register clocksource only if its advertised Enable hyperv_clocksource only if its advertised as a feature. XenServer 6 returns the signature which is checked in ms_hyperv_platform(), but it does not offer all features. Currently the clocksource is enabled unconditionally in ms_hyperv_init_platform(), and the result is a hanging guest. Hyper-V spec Bit 1 indicates the availability of Partition Reference Counter. Register the clocksource only if this bit is set. The guest in question prints this in dmesg: [ 0.000000] Hypervisor detected: Microsoft HyperV [ 0.000000] HyperV: features 0x70, hints 0x0 This bug can be reproduced easily be setting 'viridian=1' in a HVM domU .cfg file. A workaround without this patch is to boot the HVM guest with 'clocksource=jiffies'. Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: http://lkml.kernel.org/r/1359940959-32168-1-git-send-email-kys@microsoft.com Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 16:25:48 -08:00
Borislav Petkov	5e2a044daf	x86, head_32: Give the 6 label a real name Jumping here we are about to enable paging so rename the label accordingly. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1360592538-10643-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:48:42 -08:00
Borislav Petkov	c3a22a26d0	x86, head_32: Remove second CPUID detection from default_entry We do that once earlier now and cache it into new_cpu_data.cpuid_level so no need for the EFLAGS.ID toggling dance anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1360592538-10643-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:48:42 -08:00
Borislav Petkov	9efb58de91	x86: Detect CPUID support early at boot We detect CPUID function support on each CPU and save it for later use, obviating the need to play the toggle EFLAGS.ID game every time. C code is looking at ->cpuid_level anyway. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1360592538-10643-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:48:41 -08:00
Borislav Petkov	166df91daf	x86, head_32: Remove i386 pieces Remove code fragments detecting a 386 CPU since we don't support those anymore. Also, do not do alignment checks because they're done only at CPL3. Also, no need to preserve EFLAGS. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1360592538-10643-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:48:40 -08:00
H. Peter Anvin	8ecba5af94	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into x86/asm Merge in the updates to head_32.S from the previous urgent branch, as upcoming patches will make further changes. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:47:45 -08:00
H. Peter Anvin	ff52c3b02b	x86, doc: Clarify the use of asm("%edx") in uaccess.h Put in a comment that explains that the use of asm("%edx") in uaccess.h doesn't actually necessarily mean %edx alone. Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/511ACDFB.1050707@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 15:37:02 -08:00
Steven Rostedt	f431b634f2	tracing/syscalls: Allow archs to ignore tracing compat syscalls The tracing of ia32 compat system calls has been a bit of a pain as they use different system call numbers than the 64bit equivalents. I wrote a simple 'lls' program that lists files. I compiled it as a i686 ELF binary and ran it under a x86_64 box. This is the result: echo 0 > /debug/tracing/tracing_on echo 1 > /debug/tracing/events/syscalls/enable echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on grep lls /debug/tracing/trace [.. skipping calls before TS_COMPAT is set ...] lls-1127 [005] d... 936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420) lls-1127 [005] d... 936.409190: sys_recvfrom -> 0x8a77000 lls-1127 [005] d... 936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22) lls-1127 [005] d... 936.409215: sys_lgetxattr -> 0xf76ff000 lls-1127 [005] d... 936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4) lls-1127 [005] d... 936.409228: sys_dup2 -> 0xfffffffffffffffe lls-1127 [005] d... 936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000) lls-1127 [005] d... 936.409242: sys_newfstat -> 0x3 lls-1127 [005] d... 936.409243: sys_removexattr(pathname: 3, name: ffcd0060) lls-1127 [005] d... 936.409244: sys_removexattr -> 0x0 lls-1127 [005] d... 936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2) lls-1127 [005] d... 936.409248: sys_lgetxattr -> 0xf76e5000 lls-1127 [005] d... 936.409248: sys_newlstat(filename: 3, statbuf: 19614) lls-1127 [005] d... 936.409249: sys_newlstat -> 0x0 lls-1127 [005] d... 936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000) lls-1127 [005] d... 936.409279: sys_newfstat -> 0x3 lls-1127 [005] d... 936.409279: sys_close(fd: 3) lls-1127 [005] d... 936.421550: sys_close -> 0x200 lls-1127 [005] d... 936.421558: sys_removexattr(pathname: 3, name: ffcd00d0) lls-1127 [005] d... 936.421560: sys_removexattr -> 0x0 lls-1127 [005] d... 936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802) lls-1127 [005] d... 936.421574: sys_lgetxattr -> 0x4d564000 lls-1127 [005] d... 936.421575: sys_capget(header: 4d70f000, dataptr: 1000) lls-1127 [005] d... 936.421580: sys_capget -> 0x0 lls-1127 [005] d... 936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812) lls-1127 [005] d... 936.421589: sys_lgetxattr -> 0x4d710000 lls-1127 [005] d... 936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32) lls-1127 [005] d... 936.426141: sys_lgetxattr -> 0x4d713000 lls-1127 [005] d... 936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0) lls-1127 [005] d... 936.426146: sys_newlstat -> 0x0 lls-1127 [005] d... 936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22) Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are obviously incorrect, and confusing. Other efforts have been made to fix this: https://lkml.org/lkml/2012/3/26/367 But the real solution is to rewrite the syscall internals and come up with a fixed solution. One that doesn't require all the kluge that the current solution has. Thus for now, instead of outputting incorrect data, simply ignore them. With this patch the changes now have: #> grep lls /debug/tracing/trace #> Compat system calls simply are not traced. If users need compat syscalls, then they should just use the raw syscall tracepoints. For an architecture to make their compat syscalls ignored, it must define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also define an arch_trace_is_compat_syscall() function that will return true if the current task should ignore tracing the syscall. I want to stress that this change does not affect actual syscalls in any way, shape or form. It is only used within the tracing system and doesn't interfere with the syscall logic at all. The changes are consolidated nicely into trace_syscalls.c and asm/ftrace.h. I had to make one small modification to asm/thread_info.h and that was to remove the include of asm/ftrace.h. As asm/ftrace.h required the current_thread_info() it was causing include hell. That include was added back in 2008 when the function graph tracer was added: commit `caf4b323` "tracing, x86: add low level support for ftrace return tracing" It does not need to be included there. Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-02-12 17:46:28 -05:00
H. Peter Anvin	3578baaed4	x86, mm: Redesign get_user with a __builtin_choose_expr hack Instead of using a bitfield, use an odd little trick using typeof, __builtin_choose_expr, and sizeof. __builtin_choose_expr is explicitly defined to not convert its type (its argument is required to be a constant expression) so this should be well-defined. The code is still not 100% preturbation-free versus the baseline before 64-bit get_user(), but the differences seem to be very small, mostly related to padding and to gcc deciding when to spill registers. Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/511A8922.6050908@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-12 12:46:40 -08:00
H. Peter Anvin	16640165c9	x86: Be consistent with data size in getuser.S Consistently use the data register by name and use a sized assembly instruction in getuser.S. There is never any reason to macroize it, and being inconsistent in the same file is just annoying. No actual code change. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-02-11 23:14:48 -08:00
H. Peter Anvin	b390784dc1	x86, mm: Use a bitfield to mask nuisance get_user() warnings Even though it is never executed, gcc wants to warn for casting from a large integer to a pointer. Furthermore, using a variable with __typeof__() doesn't work because __typeof__ retains storage specifiers (const, restrict, volatile). However, we can declare a bitfield using sizeof(), which is legal because sizeof() is a constant expression. This quiets the warning, although the code generated isn't 100% identical from the baseline before `96477b4` x86-32: Add support for 64bit get_user(): [x86-mb is baseline, x86-mm is this commit] text data bss filename 113716147 15858380 35037184 tip.x86-mb/o.i386-allconfig/vmlinux 113716145 15858380 35037184 tip.x86-mm/o.i386-allconfig/vmlinux 12989837 `3597944` 12255232 tip.x86-mb/o.i386-modconfig/vmlinux 12989831 `3597944` 12255232 tip.x86-mm/o.i386-modconfig/vmlinux 1462784 237608 1401988 tip.x86-mb/o.i386-noconfig/vmlinux 1462837 237608 1401964 tip.x86-mm/o.i386-noconfig/vmlinux 7938994 553688 7639040 tip.x86-mb/o.i386-pae/vmlinux 7943136 557784 7639040 tip.x86-mm/o.i386-pae/vmlinux 7186126 510572 6574080 tip.x86-mb/o.i386/vmlinux 7186124 510572 6574080 tip.x86-mm/o.i386/vmlinux 103747269 33578856 65888256 tip.x86-mb/o.x86_64-allconfig/vmlinux 103746949 33578856 65888256 tip.x86-mm/o.x86_64-allconfig/vmlinux 12116695 11035832 20160512 tip.x86-mb/o.x86_64-modconfig/vmlinux 12116567 11035832 20160512 tip.x86-mm/o.x86_64-modconfig/vmlinux 1700790 380524 511808 tip.x86-mb/o.x86_64-noconfig/vmlinux 1700790 380524 511808 tip.x86-mm/o.x86_64-noconfig/vmlinux 12413612 1133376 1101824 tip.x86-mb/o.x86_64/vmlinux 12413484 1133376 1101824 tip.x86-mm/o.x86_64/vmlinux Cc: Jamie Lokier <jamie@shareable.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130209110031.GA17833@n2100.arm.linux.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2013-02-11 17:26:51 -08:00
Mike Travis	d924f947a4	x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3 This patch trims the MMR register definitions after the updates for the SGI UV3 system have been applied. Note that because these definitions are automatically generated from the RTL we cannot control the length of the names. Therefore there are lines that exceed 80 characters. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194509.173026880@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:18:25 -08:00
Mike Travis	0af6352045	x86, uv, uv3: Update Time Support for SGI UV3 This patch updates time support for the SGI UV3 hub. Since the UV2 and UV3 time support is identical, "is_uvx_hub" is used instead of having both "is_uv2_hub" and "is_uv3_hub". Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.893907185@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:18:12 -08:00
Mike Travis	b15cc4a12b	x86, uv, uv3: Update x2apic Support for SGI UV3 This patch adds support for the SGI UV3 hub to the common x2apic functions. The primary changes are to account for the similarities between UV2 and UV3 which are encompassed within the "UVX" nomenclature. One significant difference within UV3 is the handling of the MMIOH regions which are redirected to the target blade (with the device) in a different manner. It also now has two MMIOH regions for both small and large BARs. This aids in limiting the amount of physical address space removed from real memory that's used for I/O in the max config of 64TB. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.752924185@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Steffen Persvold <sp@numascale.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:18:03 -08:00
Mike Travis	6edbd4714e	x86, uv, uv3: Update Hub Info for SGI UV3 This patch updates the UV HUB info for UV3. The "is_uv3_hub" and "is_uvx_hub" (UV2 or UV3) functions are added as well as the addresses and sizes of the MMR regions for UV3. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.610723192@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:17:50 -08:00
Mike Travis	526018bc5e	x86, uv, uv3: Update ACPI Check to include SGI UV3 Add UV3 to exclusion list. Instead of adding every new series of SGI UV systems, just check oem_id to have a prefix of "SGI". Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.457937455@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:17:44 -08:00
Mike Travis	60fe7be34d	x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3) This patch updates the MMR register definitions for the SGI UV3 system. Note that because these definitions are automatically generated from the RTL we cannot control the length of the names. Therefore there are lines that exceed 80 characters. All the new MMR definitions are added in this patch. The patches that follow then update the references. The last patch is a "trim" patch which reduces the size of the MMR definitions file by about a third. This keeps "bi-sectability" in place as the intermediate patches would not compile correctly if the trimmed MMR defines were done first. Signed-off-by: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130211194508.326204556@gulag1.americas.sgi.com Acked-by: Russ Anderson <rja@sgi.com> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-11 17:17:33 -08:00
Rafael J. Wysocki	17b1639b30	Merge branch 'acpi-lpss' * acpi-lpss: ACPI / platform: create LPSS clocks if Lynxpoint devices are found during scan clk: x86: add support for Lynxpoint LPSS clocks x86: add support for Intel Low Power Subsystem ACPI / platform: fix comment about the platform device name ACPI: add support for CSRT table	2013-02-11 13:21:27 +01:00
Rafael J. Wysocki	48694bdb38	Merge branch 'acpica' * acpica: (56 commits) ACPICA: Update version to 20130117 ACPICA: Update predefined info table for _MLS method ACPICA: Remove some extraneous newlines in ACPI_ERROR type calls ACPICA: iASL/Disassembler: Add option to ignore NOOP opcodes/operators ACPICA: AcpiGetSleepTypeData: Allow \_Sx to return either 1 or 2 integers ACPICA: Update ACPICA copyrights to 2013 ACPICA: Update predefined info table ACPICA: Cleanup table handler naming conflicts. ACPICA: Source restructuring: split large files into 8 new files. ACPICA: Cleanup PM_TIMER_FREQUENCY definition. ACPICA: Cleanup ACPI_DEBUG_PRINT macros to fix potential build breakages. ACPICA: Update version to `20121220`. ACPICA: Interpreter: Fix Store() when implicit conversion is not possible. ACPICA: Resources: Split interrupt share/wake bits into two fields. ACPICA: Resources: Support for ACPI 5 wake bit in ExtendedInterrupt descriptor. ACPICA: Interpreter: Add warning if 64-bit constant appears in 32-bit table. ACPICA: Update ACPICA initialization messages. ACPICA: Namespace: Eliminate dot...dot output during initialization. ACPICA: Resource manager: Add support for ACPI 5 wake bit in IRQ descriptor. ACPICA: Fix possible memory leak in dispatcher error path. ...	2013-02-11 13:20:33 +01:00
Stoney Wang	cb214ede76	x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems When a HP ProLiant DL980 G7 Server boots a regular kernel, there will be intermittent lost interrupts which could result in a hang or (in extreme cases) data loss. The reason is that this system only supports x2apic physical mode, while the kernel boots with a logical-cluster default setting. This bug can be worked around by specifying the "x2apic_phys" or "nox2apic" boot option, but we want to handle this system without requiring manual workarounds. The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table. As all apicids are smaller than 255, BIOS need to pass the control to the OS with xapic mode, according to x2apic-spec, chapter 2.9. Current code handle x2apic when BIOS pass with xapic mode enabled: When user specifies x2apic_phys, or FADT indicates PHYSICAL: 1. During madt oem check, apic driver is set with xapic logical or xapic phys driver at first. 2. enable_IR_x2apic() will enable x2apic_mode. 3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe() will install the correct x2apic phys driver and use x2apic phys mode. Otherwise it will skip the driver will let x2apic_cluster_probe to take over to install x2apic cluster driver (wrong one) even though FADT indicates PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL. Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem. Signed-off-by: Stoney Wang <song-bo.wang@hp.com> [ updated the changelog and simplified the code ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-11 11:13:00 +01:00
Shuah Khan	136867f517	x86/kvm: Fix compile warning in kvm_register_steal_time() Fix the following compile warning in kvm_register_steal_time(): CC arch/x86/kernel/kvm.o arch/x86/kernel/kvm.c: In function ‘kvm_register_steal_time’: arch/x86/kernel/kvm.c:302:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat] Introduced via: `5dfd486c47` x86, kvm: Fix kvm's use of __pa() on percpu areas `d765653445` x86, mm: Create slow_virt_to_phys() `f3c4fbb68e` x86, mm: Use new pagetable helpers in try_preserve_large_page() `4cbeb51b86` x86, mm: Pagetable level size/shift/mask helpers `a25b931684` x86, mm: Make DEBUG_VIRTUAL work earlier in boot Signed-off-by: Shuah Khan <shuah.khan@hp.com> Acked-by: Gleb Natapov <gleb@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: shuahkhan@gmail.com Cc: avi@redhat.com Cc: gleb@redhat.com Cc: mst@redhat.com Link: http://lkml.kernel.org/r/1360119442.8356.8.camel@lorien2 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-11 11:04:31 +01:00
Takuya Yoshikawa	7a905b1485	KVM: Remove user_alloc from struct kvm_memory_slot This field was needed to differentiate memory slots created by the new API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent, KVM_SET_MEMORY_REGION, whose support was dropped long before: commit `b74a07beed` KVM: Remove kernel-allocated memory regions Although we also have private memory slots to which KVM allocates memory with vm_mmap(), !user_alloc slots in other words, the slot id should be enough for differentiating them. Note: corresponding function parameters will be removed later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-02-11 11:52:00 +02:00
Yang Zhang	257090f702	KVM: VMX: disable apicv by default Without Posted Interrupt, current code is broken. Just disable by default until Posted Interrupt is ready. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-02-11 10:51:13 +02:00
Len Brown	27be457000	x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag Remove 32-bit x86 a cmdline param "no-hlt", and the cpuinfo_x86.hlt_works_ok that it sets. If a user wants to avoid HLT, then "idle=poll" is much more useful, as it avoids invocation of HLT in idle, while "no-hlt" failed to do so. Indeed, hlt_works_ok was consulted in only 3 places. First, in /proc/cpuinfo where "hlt_bug yes" would be printed if and only if the user booted the system with "no-hlt" -- as there was no other code to set that flag. Second, check_hlt() would not invoke halt() if "no-hlt" were on the cmdline. Third, it was consulted in stop_this_cpu(), which is invoked by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() -- all cases where the machine is being shutdown/reset. The flag was not consulted in the more frequently invoked play_dead()/hlt_play_dead() used in processor offline and suspend. Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations indicating that it would be removed in 2012. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2013-02-10 03:32:22 -05:00
Len Brown	69fb3676df	x86 idle: remove mwait_idle() and "idle=mwait" cmdline param mwait_idle() is a C1-only idle loop intended to be more efficient than HLT, starting on Pentium-4 HT-enabled processors. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI processor_idle and intel_idle use only mwait_idle_with_hints(), and no longer use mwait_idle(). Here we simplify the x86 native idle code by removing mwait_idle(), and the "idle=mwait" bootparam used to invoke it. Since Linux 3.0 there has been a boot-time warning when "idle=mwait" was invoked saying it would be removed in 2012. This removal was also noted in the (now removed:-) feature-removal-schedule.txt. After this change, kernels configured with (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware that supports MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be enabled. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2013-02-10 03:03:41 -05:00
Len Brown	6a377ddc4e	xen idle: make xen-specific macro xen-specific This macro is only invoked by Xen, so make its definition specific to Xen. > set_pm_idle_to_default() < xen_set_default_idle() Signed-off-by: Len Brown <len.brown@intel.com> Cc: xen-devel@lists.xensource.com	2013-02-10 01:06:34 -05:00
Len Brown	e022e7eb90	intel_idle: remove assumption of one C-state per MWAIT flag Remove the assumption that cstate_tables are indexed by MWAIT flag values. Each entry identifies itself via its own flags value. This change is needed to support multiple states that share the same MWAIT flags. Note that this can have an effect on what state is described by 'N' on cmdline intel_idle.max_cstate=N on some systems. intel_idle.max_cstate=0 still disables the driver intel_idle.max_cstate=1 still results in just C1(E) However, "place holders" in the sparse C-state name-space (eg. Atom) have been removed. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:29:16 -05:00
Len Brown	137ecc779c	intel_idle: remove use and definition of MWAIT_MAX_NUM_CSTATES Cosmetic only. Replace use of MWAIT_MAX_NUM_CSTATES with CPUIDLE_STATE_MAX. They are both 8, so this patch has no functional change. The reason to change is that intel_idle will soon be able to export more than the 8 "major" states supported by MWAIT. When we hit that limit, it is important to know where the limit comes from. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:28:10 -05:00
Len Brown	6792041834	tools/power turbostat: decode MSR_IA32_POWER_CTL When verbose is enabled, print the C1E-Enable bit in MSR_IA32_POWER_CTL. also delete some redundant tests on the verbose variable. Signed-off-by: Len Brown <len.brown@intel.com>	2013-02-08 19:26:16 -05:00
David S. Miller	fd5023111c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Synchronize with 'net' in order to sort out some l2tp, wireless, and ipv6 GRE fixes that will be built on top of in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-08 18:02:14 -05:00
Oleg Nesterov	74e59dfc6b	uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain() Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is what consumer->handler() needs but uprobe_get_swbp_addr() is not exported. This also simplifies the code and makes it more consistent across the supported architectures. handle_swbp() becomes the only caller of uprobe_get_swbp_addr(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>	2013-02-08 17:47:11 +01:00
Oleg Nesterov	cf31ec3f7f	uprobes/x86: Change __skip_sstep() to actually skip the whole insn __skip_sstep() doesn't update regs->ip. Currently this is correct but only "by accident" and it doesn't skip the whole insn. Change it to advance ->ip by the length of the detected 0x66*0x90 sequence. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>	2013-02-08 17:47:11 +01:00
Ville Syrjälä	96477b4cd7	x86-32: Add support for 64bit get_user() Implement __get_user_8() for x86-32. It will return the 64-bit result in edx:eax register pair, and ecx is used to pass in the address and return the error value. For consistency, change the register assignment for all other __get_user_x() variants, so that address is passed in ecx/rcx, the error value is returned in ecx/rcx, and eax/rax contains the actual value. [ hpa: I modified the patch so that it does NOT change the calling conventions for the existing callsites, this also means that the code is completely unchanged for 64 bits. Instead, continue to use eax for address input/error output and use the ecx:edx register pair for the output. ] This is a partial refresh of a patch [1] by Jamie Lokier from 2004. Only the minimal changes to implement 64bit get_user() were picked from the original patch. [1] http://article.gmane.org/gmane.linux.kernel/198823 Originally-by: Jamie Lokier <jamie@shareable.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://lkml.kernel.org/r/1355312043-11467-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-07 15:07:28 -08:00
Kees Cook	e575a86fdc	x86: Do not leak kernel page mapping locations Without this patch, it is trivial to determine kernel page mappings by examining the error code reported to dmesg[1]. Instead, declare the entire kernel memory space as a violation of a present page. Additionally, since show_unhandled_signals is enabled by default, switch branch hinting to the more realistic expectation, and unobfuscate the setting of the PF_PROT bit to improve readability. [1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/ Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-07 19:57:44 +01:00
Xiao Guangrong	24db2734ad	KVM: MMU: cleanup __direct_map Use link_shadow_page to link the sp to the spte in __direct_map Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-06 22:42:09 -02:00
Xiao Guangrong	f761620377	KVM: MMU: remove pt_access in mmu_set_spte It is only used in debug code, so drop it Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-06 22:42:08 -02:00
Xiao Guangrong	55dd98c3a8	KVM: MMU: cleanup mapping-level Use min() to cleanup mapping_level Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-06 22:42:08 -02:00
Xiao Guangrong	caf6900f2d	KVM: MMU: lazily drop large spte Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them not present, the page fault caused by read access can be avoided The idea is from Avi: \| As I mentioned before, write-protecting a large spte is a good idea, \| since it moves some work from protect-time to fault-time, so it reduces \| jitter. This removes the need for the return value. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-06 22:28:01 -02:00
Gleb Natapov	5037878e22	KVM: VMX: cleanup vmx_set_cr0(). When calculating hw_cr0 teh current code masks bits that should be always on and re-adds them back immediately after. Cleanup the code by masking only those bits that should be dropped from hw_cr0. This allow us to get rid of some defines. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-06 22:00:02 -02:00
H. Peter Anvin	bb9b1a834f	Retract MCE-specific UAPI exports which are unused and shouldn't be used. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQ7XZeAAoJEBLB8Bhh3lVKzAAQAKJPmNM4jK3o7QixkySFOKAN GKDmciwBCSLDvhDU5bXSKEzhxaWq+/Wp8aTy0u28TUtIq+Dnya8+9xAMZqe56lNv slBNk1N2IIJBqZoul1FCY7MyBnss19nh2h/jN9PChcY9dGpgkRnKc3rkjjy4UJ10 YibUm/wLuzNGCskt6nGO4uVSDXyv5W6PCnpOb6IW5KFywHE0ojsVwZQM5Vnakjrd vsyicHHaIXSnRtyDWGZSX4JxY7HDK7+v03OHNvO3FtFzNjTJ8r1qU7LVbC1yBPP6 uFN+piBUYn6q6Gsy67H1FcufGTz0IzFwyx/akW1MkgdN4FggrLiFf0a10Fh0Dx3s jd2tuCv72rfOQUZAYg9kobHk1NecCZt0Wt3NWul3WjZHZC35nFgznnUiznAPhoCI /sBKE2VQt0CejigNTIEpaToc/kHuQy3RzmMphknROEehXdm79TqdirNNHKrDkYug U9SV4nsOaRFcSZDPXK52xFIEb9Uc90zSWqNblhBRXt7sBC34EqZk95jYDgOrZxHQ v8iNieYgOezXzIyyUAHC7dlIUbkHlBsPG+8eEoWFcvOBI3UbGvOv3abMO18yBZqr 28An+v8pMsT9Q5sUwHFwvblD/vDaj/NIO0OP+y40Z96KnXz8Cm4mpkzBvBdVVxUs cmaDId47QRqdvIW2mBZZ =2Zr4 -----END PGP SIGNATURE----- Merge tag 'ras_for_3.8' into x86/urgent Retract MCE-specific UAPI exports which are unused and shouldn't be used. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-06 14:18:53 -08:00
Jacob Shin	0fbdad078a	perf/x86: Allow for architecture specific RDPMC indexes Similar to config_base and event_base, allow architecture specific RDPMC ECX values. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-6-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:24 +01:00
Jacob Shin	4c1fd17a1c	perf/x86: Move MSR address offset calculation to architecture specific files Move counter index to MSR address offset calculation to architecture specific files. This prepares the way for perf_event_amd to enable counter addresses that are not contiguous -- for example AMD Family 15h processors have 6 core performance counters starting at 0xc0010200 and 4 northbridge performance counters starting at 0xc0010240. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-5-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:24 +01:00
Jacob Shin	9f19010af8	perf/x86/amd: Use proper naming scheme for AMD bit field definitions Update these AMD bit field names to be consistent with naming convention followed by the rest of the file. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-4-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:23 +01:00
Robert Richter	4dd4c2ae55	perf/x86/amd: Generalize northbridge constraints code for family 15h Generalize northbridge constraints code for family 10h so that later we can reuse the same code path with other AMD processor families that have the same northbridge event constraints. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-3-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:23 +01:00
Robert Richter	2c53c3dd0b	perf/x86/amd: Rework northbridge event constraints handler Code simplification. No functional changes. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Robert Richter <rric@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360171589-6381-2-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-06 19:45:22 +01:00
Gleb Natapov	b0da5bec30	KVM: VMX: add missing exit names to VMX_EXIT_REASONS array Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-05 23:29:46 -02:00
Dongxiao Xu	c08800a56c	KVM: VMX: disable SMEP feature when guest is in non-paging mode SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-05 23:28:07 -02:00
Gleb Natapov	834be0d83f	Revert "KVM: MMU: split kvm_mmu_free_page" This reverts commit `bd4c86eaa6`. There is not user for kvm_mmu_isolate_page() any more. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-05 22:47:39 -02:00
Ingo Molnar	b2c77a57e4	This implements the cputime accounting on full dynticks CPUs. Typical cputime stats infrastructure relies on the timer tick and its periodic polling on the CPU to account the amount of time spent by the CPUs and the tasks per high level domains such as userspace, kernelspace, guest, ... Now we are preparing to implement full dynticks capability on Linux for Real Time and HPC users who want full CPU isolation. This feature requires a cputime accounting that doesn't depend on the timer tick. To implement it, this new cputime infrastructure plugs into kernel/user/guest boundaries to take snapshots of cputime and flush these to the stats when needed. This performs pretty much like CONFIG_VIRT_CPU_ACCOUNTING except that context location and cputime snaphots are synchronized between write and read side such that the latter can safely retrieve the pending tickless cputime of a task and add it to its latest cputime snapshot to return the correct result to the user. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa 1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS 9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD NLQO5OZXE4oIQmDFb0FV =1Tzp -----END PGP SIGNATURE----- Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker: "This implements the cputime accounting on full dynticks CPUs. Typical cputime stats infrastructure relies on the timer tick and its periodic polling on the CPU to account the amount of time spent by the CPUs and the tasks per high level domains such as userspace, kernelspace, guest, ... Now we are preparing to implement full dynticks capability on Linux for Real Time and HPC users who want full CPU isolation. This feature requires a cputime accounting that doesn't depend on the timer tick. To implement it, this new cputime infrastructure plugs into kernel/user/guest boundaries to take snapshots of cputime and flush these to the stats when needed. This performs pretty much like CONFIG_VIRT_CPU_ACCOUNTING except that context location and cputime snaphots are synchronized between write and read side such that the latter can safely retrieve the pending tickless cputime of a task and add it to its latest cputime snapshot to return the correct result to the user." Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-05 13:10:33 +01:00
Gleb Natapov	eb3fce87cc	KVM: MMU: drop superfluous is_present_gpte() check. Gust page walker puts only present ptes into ptes[] array. No need to check it again. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-04 23:24:28 -02:00
Gleb Natapov	116eb3d30e	KVM: MMU: drop superfluous min() call. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-04 23:24:28 -02:00
Gleb Natapov	2c9afa52ef	KVM: MMU: set base_role.nxe during mmu initialization. Move base_role.nxe initialisation to where all other roles are initialized. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-04 23:24:28 -02:00
Gleb Natapov	9bb4f6b15e	KVM: MMU: drop unneeded checks. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-04 23:24:28 -02:00
Gleb Natapov	feb3eb704a	KVM: MMU: make spte_is_locklessly_modifiable() more clear spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and SPTE_MMU_WRITEABLE are present on spte. Make it more explicit. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-02-04 23:24:28 -02:00
Linus Torvalds	3f4e5aacf7	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three small fixlets" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/cacheinfo: Shut up annoying warning x86, doc: Boot protocol 2.12 is in 3.8 x86-64: Replace left over sti/cli in ia32 audit exit code	2013-02-05 07:59:44 +11:00
Linus Torvalds	51c1abb95f	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Three fixlets and two small (and low risk) hw-enablement changes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix event group context move x86/perf: Add IvyBridge EP support perf/x86: Fix P6 driver section warning arch/x86/tools/insn_sanity.c: Identify source of messages perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support	2013-02-05 07:57:09 +11:00
Borislav Petkov	f76e39c531	x86/intel/cacheinfo: Shut up annoying warning I've been getting the following warning when doing randbuilds since forever. Now it finally pissed me off just the perfect amount so that I can fix it. arch/x86/kernel/cpu/intel_cacheinfo.c:489:27: warning: ‘cache_disable_0’ defined but not used [-Wunused-variable] arch/x86/kernel/cpu/intel_cacheinfo.c:491:27: warning: ‘cache_disable_1’ defined but not used [-Wunused-variable] arch/x86/kernel/cpu/intel_cacheinfo.c:524:27: warning: ‘subcaches’ defined but not used [-Wunused-variable] It happens because in randconfigs where CONFIG_SYSFS is not set, the whole sysfs-interface to L3 cache index disabling is remaining unused and gcc correctly warns about it. Make it optional, depending on CONFIG_SYSFS too, as is the case with other sysfs-related machinery in this file. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1359969195-27362-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-04 11:29:52 +01:00
Thomas Gleixner	90889a635a	Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux into timers/core Trivial conflict in arch/x86/Kconfig Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-04 11:03:03 +01:00
Al Viro	5b3eb3ade4	x86: switch to generic old sigaction Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:27 -05:00
Al Viro	29fd448084	x86: switch to generic compat rt_sigaction() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	d7c43e4afb	x86: switch to generic compat sched_rr_get_interval() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	15ce1f7154	x86,um: switch to generic old sigsuspend() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	7b83d1a297	x86: switch to generic compat rt_sigqueueinfo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	f45adb0499	x86: switch to generic compat rt_sigpending() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	49cb25e929	x86: get rid of pt_regs argument in vm86/vm86old Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	3fe26fa34d	x86: get rid of pt_regs argument in sigreturn variants Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	b3af11afe0	x86: get rid of pt_regs argument of iopl(2) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	ea93a6e2e7	amd64: get rid of useless RESTORE_TOP_OF_STACK in stub_execve() we are not going to return via SYSRET anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:23 -05:00
Al Viro	574c4866e3	consolidate kernel-side struct sigaction declarations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 15:09:22 -05:00
Al Viro	92a3ce4a1e	consolidate declarations of k_sigaction Only alpha and sparc are unusual - they have ka_restorer in it. And nobody needs that exposed to userland. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 15:09:22 -05:00
Al Viro	eaca6eae3e	sanitize rt_sigaction() situation a bit Switch from __ARCH_WANT_SYS_RT_SIGACTION to opposite (!CONFIG_ODD_RT_SIGACTION); the only two architectures that need it are alpha and sparc. The reason for use of CONFIG_... instead of __ARCH_... is that it's needed only kernel-side and doing it that way avoids a mess with include order on many architectures. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 15:09:18 -05:00
H. Peter Anvin	68d00bbebb	Merge remote-tracking branch 'origin/x86/mm' into x86/mm2 Explicitly merging these two branches due to nontrivial conflicts and to allow further work. Resolved Conflicts: arch/x86/kernel/head32.c arch/x86/kernel/head64.c arch/x86/mm/init_64.c arch/x86/realmode/init.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-02-01 02:28:36 -08:00
H. Peter Anvin	b5831174f9	Linux 3.8-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRCxWdAAoJEHm+PkMAQRiG3bAH/28D2NbRLIDo6QIzk3seCCh3 A6vqDWbX671JRa0RO38DrWhqv6L/EisvjNZMVxBNaN635+3+yC7wsKziXYxA4+AL Ef9JISiwXykRWwrC8Q34YBWfWgeFTmaau71IRv45x2OTwiijd6cvjXhLDrOhHEGL rdAGJxIdBOfuASw1zpn9PxzfAtg1j/FGP03B4yy+JFhYzuDp3pvnDIysAnXefLml UheuBELdNf8Jx7NtW80PTa+TQtruWPC8otoiJevV4TrAWmAOctJiM1izL+VOZqqD I/v5aGf9mCN+cxLq06imwgEHWmP1I7yDdjjTHrr4wwIMws1vg8PNsJqG3AsPcwM= =dR8S -----END PGP SIGNATURE----- Merge tag 'v3.8-rc6' into x86/urgent Linux 3.8-rc6 Merged in order to add a documentation update versus new code in upstream. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 20:22:57 -08:00
H. Peter Anvin	07f4207a30	x86-32, mm: Remove reference to alloc_remap() We have removed the remap allocator for x86-32, and x86-64 never had it (and doesn't need it). Remove residual reference to it. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQVn6_QZi3fNQ-JHYiR-7jeDJ5hT0SyT_%2BzVvfOj=PzF3w@mail.gmail.com	2013-01-31 14:12:30 -08:00
H. Peter Anvin	bb112aec5e	x86-32, mm: Remove reference to resume_map_numa_kva() Remove reference to removed function resume_map_numa_kva(). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com	2013-01-31 14:12:30 -08:00
Dave Hansen	f03574f2d5	x86-32, mm: Rip out x86_32 NUMA remapping code This code was an optimization for 32-bit NUMA systems. It has probably been the cause of a number of subtle bugs over the years, although the conditions to excite them would have been hard to trigger. Essentially, we remap part of the kernel linear mapping area, and then sometimes part of that area gets freed back in to the bootmem allocator. If those pages get used by kernel data structures (say mem_map[] or a dentry), there's no big deal. But, if anyone ever tried to use the linear mapping for these pages _and_ cared about their physical address, bad things happen. For instance, say you passed __GFP_ZERO to the page allocator and then happened to get handed one of these pages, it zero the remapped page, but it would make a pte to the _old_ page. There are probably a hundred other ways that it could screw with things. We don't need to hang on to performance optimizations for these old boxes any more. All my 32-bit NUMA systems are long dead and buried, and I probably had access to more than most people. This code is causing real things to break today: https://lkml.org/lkml/2013/1/9/376 I looked in to actually fixing this, but it requires surgery to way too much brittle code, as well as stuff like per_cpu_ptr_to_phys(). [ hpa: Cc: this for -stable, since it is a memory corruption issue. However, an alternative is to simply mark NUMA as depends BROKEN rather than EXPERIMENTAL in the X86_32 subclause... ] Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>	2013-01-31 14:12:30 -08:00
Boris Ostrovsky	f0322bd341	x86, AMD: Enable WC+ memory type on family 10 processors In some cases BIOS may not enable WC+ memory type on family 10 processors, instead converting what would be WC+ memory to CD type. On guests using nested pages this could result in performance degradation. This patch enables WC+. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1359495169-23278-1-git-send-email-ostr@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:35:38 -08:00
Boris Ostrovsky	6bf08a8dcd	x86, AMD: Clean up init_amd() Clean up multiple declarations of variable used for rd/wrmsr. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1359495136-23244-1-git-send-email-ostr@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:35:32 -08:00
Fenghua Yu	da76f64e7e	x86/Kconfig: Make early microcode loading a configuration feature MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new configurations to enable or disable the feature. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-13-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:20:42 -08:00
Fenghua Yu	cd745be89e	x86/mm/init.c: Copy ucode from initrd image to kernel memory Before initrd image is freed, copy valid ucode patches from initrd image to kernel memory. The saved ucode will be used to update AP in resume or hotplug. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-12-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:20:26 -08:00
Fenghua Yu	feddc9de8b	x86/head64.c: Early update ucode in 64-bit This updates ucode on BSP in 64-bit mode. Paging and virtual address are working now. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-11-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:20:24 -08:00
Fenghua Yu	63b553c68d	x86/head_32.S: Early update ucode in 32-bit This updates ucode in 32-bit kernel on BSP and AP. At this point, there is no paging and no virtual address yet. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-10-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:20 -08:00
Fenghua Yu	ec400ddeff	x86/microcode_intel_early.c: Early update ucode on Intel's CPU Implementation of early update ucode on Intel's CPU. load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format ucode followed by ordinary initrd image file. The binary ucode file is stored in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode patches with the same model as BSP are saved in memory. A matching ucode patch is updated on BSP. load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:18 -08:00
Fenghua Yu	086fc8f803	x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() This function is called in __native_flush_tlb_global() and after apply_microcode_early(). Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:16 -08:00
Fenghua Yu	e666dfa273	x86/microcode_intel_lib.c: Early update ucode on Intel's CPU Define interfaces microcode_sanity_check() and get_matching_microcode(). They are called both in early boot time and in microcode Intel driver. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-7-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:14 -08:00
Fenghua Yu	a8ebf6d1d6	x86/microcode_core_early.c: Define interfaces for early loading ucode Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and AP in early boot time. These are generic interfaces. Internally they call vendor specific implementations. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:12 -08:00
Fenghua Yu	e6ebf5deaa	x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP In 64 bit, load ucode on AP in cpu_init(). In 32 bit, show ucode loading info on AP in cpu_init(). Microcode has been loaded earlier before paging. Now it is safe to show the loading microcode info on this AP. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:19:06 -08:00
Fenghua Yu	d288e1cf8e	x86/common.c: Make have_cpuid_p() a global function Remove static declaration in have_cpuid_p() to make it a global function. The function will be called in early loading microcode. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:18:58 -08:00
Fenghua Yu	9cd4d78e21	x86/microcode_intel.h: Define functions and macros for early loading ucode Define some functions and macros that will be used in early loading ucode. Some of them are moved from microcode_intel.c driver in order to be called in early boot phase before module can be called. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-31 13:18:50 -08:00
Thomas Gleixner	a9037430c6	Merge branch 'timers/for-arm' into timers/core	2013-01-31 22:17:10 +01:00
Sukadev Bhattiprolu	2663960c15	perf: Make EVENT_ATTR global Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is available to all architectures. Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass in the variable name as a parameter. Changelog[v2] - [Jiri Olsa] No need to define PMU_EVENT_PTR() Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2013-01-31 13:07:50 -03:00
Lee, Chun-Yi	deb94101c4	x86, efi: Allow slash in file path of initrd When initrd file didn't put at the same place with stub kernel, we need give the file path of initrd, but need use backslash to separate directory and file. It's not friendly to unix/linux user, and not so intuitive for bootloader forward paramters to efi stub kernel by chainloading. This patch add support to handle_ramdisks for allow slash in file path of initrd, it convert slash to backlash when parsing path. In additional, this patch also separates print code of efi_char16_t from efi_printk, and print out the path/filename of initrd when failed to open initrd file. It's good for debug and discover typo. Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-01-31 14:44:44 +00:00
Borislav Petkov	1e9209edc7	x86/numa: Use __pa_nodebug() instead ... and fix the following warning: arch/x86/mm/numa.c: In function ‘setup_node_data’: arch/x86/mm/numa.c:222:3: warning: passing argument 1 of ‘__phys_addr_nodebug’ makes integer from pointer without a cast Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1359245901-8512-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-01-31 11:40:55 +01:00
Jan Beulich	40a1ef95da	x86-64: Replace left over sti/cli in ia32 audit exit code For some reason they didn't get replaced so far by their paravirt equivalents, resulting in code to be run with interrupts disabled that doesn't expect so (causing, in the observed case, a BUG_ON() to trigger) when syscall auditing is enabled. David (Cc-ed) came up with an identical fix, so likely this can be taken to count as an ack from him. Reported-by: Peter Moody <pmoody@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Tested-by: Peter Moody <pmoody@google.com>	2013-01-31 10:36:01 +01:00
Linus Torvalds	04c2eee5b9	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI fixes from Peter Anvin: "This is a collection of fixes for the EFI support. The controversial bit here is a set of patches which bumps the boot protocol version as part of fixing some serious problems with the EFI handover protocol, used when booting under EFI using a bootloader as opposed to directly from EFI. These changes should also make it a lot saner to support cross-mode 32/64-bit EFI booting in the future. Getting these changes into 3.8 means we avoid presenting an inconsistent ABI to bootloaders. Other changes are display detection and fixing efivarfs." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: remove attribute check from setup_efi_pci x86, build: Dynamically find entry points in compressed startup code x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode x86, efi: Fix 32-bit EFI handover protocol entry point x86, efi: Fix display detection in EFI boot stub x86, boot: Define the 2.12 bzImage boot protocol x86/boot: Fix minor fd leakage in tools/relocs.c x86, efi: Set runtime_version to the EFI spec revision x86, efi: fix 32-bit warnings in setup_efi_pci() efivarfs: Delete dentry from dcache in efivarfs_file_write() efivarfs: Never return ENOENT from firmware efi, x86: Pass a proper identity mapping in efi_call_phys_prelog efivarfs: Drop link count of the right inode	2013-01-31 17:10:36 +11:00
Linus Torvalds	bdb0ae6a76	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of miscellaneous fixes, the most important one is the fix for the Samsung laptop bricking issue (auto-blacklisting the samsung-laptop driver); the efi_enabled() changes you see below are prerequisites for that fix. The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI debugging, and requiring CAP_SYS_RAWIO for MSR references, just as with I/O port references." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: samsung-laptop: Disable on EFI hardware efi: Make 'efi_enabled' a function to query EFI facilities smp: Fix SMP function call empty cpu mask race x86/msr: Add capabilities check x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES x86/olpc: Fix olpc-xo1-sci.c build errors arch/x86/platform/uv: Fix incorrect tlb flush all issue x86-64: Fix unwind annotations in recent NMI changes x86-32: Start out cr0 clean, disable paging before modifying cr3/4	2013-01-31 17:08:43 +11:00
Eric Dumazet	3b58908a92	x86: bpf_jit_comp: add pkt_type support Supporting access to skb->pkt_type is a bit tricky if we want to have a generic code, allowing pkt_type to be moved in struct sk_buff pkt_type is a bit field, so compiler cannot really help us to find its offset. Let's use a helper for this : It will throw a one time message if pkt_type no longer starts at a byte boundary or is no longer a 3bit field. Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 22:38:34 -05:00
H. Peter Anvin	becbd66080	Various urgent EFI fixes and some warning cleanups for v3.8 * EFI boot stub fix for Macbook Pro's from Maarten Lankhorst * Fix an oops in efivarfs from Lingzhu Xiang * 32-bit warning cleanups from Jan Beulich * Patch to Boot on >512GB RAM systems from Nathan Zimmer * Set efi.runtime_version correctly * efivarfs updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRCBrMAAoJEC84WcCNIz1VTdcP/2u3ZqohOKJAwwMkyzB3nkrQ 1mhxKGFDitAAvGQQCOq3oIMgBZHOevKznH3hZtX+hxBxwu7AuNL+qw6Baz8GYZpz guFvAZjm2JX2ko1PgtNvPUFZ1krw7TObLW2YstTWhSDoOlRK5kqmA+idaJf1aHDe /cwV6Mr6u5N/egyBBcQI1ydKLA6ogmx1zfDsS9b2Vzavw168RGqfrpH3ybcokYND /E2NtcRVZagBw35eZHEDNKcoPt5z+skCA4nJyA6bLbxMsq51ZKaK0PKKaA8vd70s 6Pc7d6zkQG/ZmaxrRfsdQUAYfJRJq/cpeTgS4YurkZB0r0gdxk6I86vYlg+xXi0X eqLAkUJJJasVY/1NK/c2vsJ03W9wDYkd2IJpUcl7rWz7Aa/RurY32QmT3SnLop7m Tzj3CgXAu/RH8FyMNMWpI85tOis7OcMUfrjmnxquQdCZpLXSsh7Rf5EgBRiv9xhH txDOX3y21Jnv2A5efAVWm5EbyI204Wq2nVDzSu0xTMXWkzdBg+/OeyYfzV0Sdguf 3/MzYTn7mVXh/EZtnvsTyNjgvVxzpXW6mAf+ne9iJaC8MUJVIeSjB7xzSfuHXUBU aUc9OnbkHRJCdVSeKqZbLwO3X5mTXqmDMfIcRle3BPewvZ9pOEv8VrGgsNxh9ixW JaCpiTdxJDFtz6cLVsNa =QrJx -----END PGP SIGNATURE----- Merge tag 'efi-for-3.8' into x86/efi Various urgent EFI fixes and some warning cleanups for v3.8 * EFI boot stub fix for Macbook Pro's from Maarten Lankhorst * Fix an oops in efivarfs from Lingzhu Xiang * 32-bit warning cleanups from Jan Beulich * Patch to Boot on >512GB RAM systems from Nathan Zimmer * Set efi.runtime_version correctly * efivarfs updates Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-30 14:43:05 -08:00
Matt Fleming	83e6818974	efi: Make 'efi_enabled' a function to query EFI facilities Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-30 11:51:59 -08:00
Yinghai Lu	8b78c21d72	x86, 64bit, mm: hibernate use generic mapping_init We should set mappings only for usable memory ranges under max_pfn Otherwise causes same problem that is fixed by x86, mm: Only direct map addresses that are marked as E820_RAM Make it only map range in pfn_mapped array. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-34-git-send-email-yinghai@kernel.org Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: linux-pm@vger.kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:59 -08:00
Yinghai Lu	72212675d1	x86, 64bit, mm: Mark data/bss/brk to nx HPA said, we should not have RW and +x set at the time. for kernel layout: [ 0.000000] Kernel Layout: [ 0.000000] .text: [0x01000000-0x021434f8] [ 0.000000] .rodata: [0x02200000-0x02a13fff] [ 0.000000] .data: [0x02c00000-0x02dc763f] [ 0.000000] .init: [0x02dc9000-0x0312cfff] [ 0.000000] .bss: [0x0313b000-0x03dd6fff] [ 0.000000] .brk: [0x03dd7000-0x03dfffff] before the patch, we have ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82dc9000 1828K RW GLB x pte 0xffffffff82dc9000-0xffffffff82e00000 220K RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff8313a000 1256K RW GLB NX pte 0xffffffff8313a000-0xffffffff83200000 792K RW GLB x pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB x pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd after patch,, we get ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82e00000 2M RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff83200000 2M RW GLB NX pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB NX pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd so data, bss, brk get NX ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:58 -08:00
Yinghai Lu	6c902b656c	x86: Merge early kernel reserve for 32bit and 64bit They are the same, and we could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:58 -08:00
Yinghai Lu	0212f91596	x86: Add Crash kernel low reservation During kdump kernel's booting stage, it need to find low ram for swiotlb buffer when system does not support intel iommu/dmar remapping. kexed-tools is appending memmap=exactmap and range from /proc/iomem with "Crash kernel", and that range is above 4G for 64bit after boot protocol 2.12. We need to add another range in /proc/iomem like "Crash kernel low", so kexec-tools could find that info and append to kdump kernel command line. Try to reserve some under 4G if the normal "Crash kernel" is above 4G. User could specify the size with crashkernel_low=XX[KMG]. -v2: fix warning that is found by Fengguang's test robot. -v3: move out get_mem_size change to another patch, to solve compiling warning that is found by Borislav Petkov <bp@alien8.de> -v4: user must specify crashkernel_low if system does not support intel or amd iommu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:58 -08:00
Yinghai Lu	7d41a8a4a2	x86, kdump: Remove crashkernel range find limit for 64bit Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for 64bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-30-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:58 -08:00
Yinghai Lu	595ad9af85	memblock: Add memblock_mem_size() Use it to get mem size under the limit_pfn. to replace local version in x86 reserved_initrd. -v2: remove not needed cast that is pointed out by HPA. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:57 -08:00
Yinghai Lu	d1af6d045f	x86, boot: Not need to check setup_header version for setup_data That is for bootloaders. setup_data is in setup_header, and bootloader is copying that from bzImage. So for old bootloader should keep that as 0 already. old kexec-tools till now for elf image set setup_data to 0, so it is ok. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-28-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:57 -08:00
Yinghai Lu	8ee2f2dfdb	x86, boot: Update comments about entries for 64bit image Now 64bit entry is fixed on 0x200, can not be changed anymore. Update the comments to reflect that. Also put info about it in boot.txt -v2: fix some grammar error Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-27-git-send-email-yinghai@kernel.org Cc: Rob Landley <rob@landley.net> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:57 -08:00
Yinghai Lu	ee92d81502	x86, boot: Support loading bzImage, boot_params and ramdisk above 4G xloadflags bit 1 indicates that we can load the kernel and all data structures above 4G; it is set if kernel is relocatable and 64bit. bootloader will check if xloadflags bit 1 is set to decide if it could load ramdisk and kernel high above 4G. bootloader will fill value to ext_ramdisk_image/size for high 32bits when it load ramdisk above 4G. kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get right positon for ramdisk. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Rob Landley <rob@landley.net> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 19:32:33 -08:00
Yinghai Lu	0e691cf824	x86, kexec, 64bit: Only set ident mapping for ram. We should set mappings only for usable memory ranges under max_pfn Otherwise causes same problem that is fixed by x86, mm: Only direct map addresses that are marked as E820_RAM This patch exposes pfn_mapped array, and only sets ident mapping for ranges in that array. This patch relies on new kernel_ident_mapping_init that could handle existing pgd/pud between different calls. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:35 -08:00
Yinghai Lu	9ebdc79f7a	x86, kexec: Replace ident_mapping_init and init_level4_page Now ident_mapping_init is checking if pgd/pud is present for every 2M, so several 2Ms are in same PUD, it will keep checking if pud is there with same pud. init_level4_page just does not check existing pgd/pud. We could use generic mapping_init with different settings in info to replace those two local grown version functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:29 -08:00
Yinghai Lu	084d128398	x86, kexec: Set ident mapping for kernel that is above max_pfn When first kernel is booted with memmap= or mem= to limit max_pfn. kexec can load second kernel above that max_pfn. We need to set ident mapping for whole image in this case instead of just for first 2M. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:26 -08:00
Yinghai Lu	577af55d80	x86, kexec: Remove 1024G limitation for kexec buffer on 64bit Now 64bit kernel supports more than 1T ram and kexec tools could find buffer above 1T, remove that obsolete limitation. and use MAXMEM instead. Tested on system with more than 1024G ram. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-22-git-send-email-yinghai@kernel.org Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:23 -08:00
Yinghai Lu	d3c433bf9a	x86, boot: Move lldt/ltr out of 64bit code section commit `08da5a2ca` x86_64: Early segment setup for VT sets up LDT and TR into a valid state in order to speed up boot decompression under VT. Those code are put in code64, and it is using GDT that is only loaded from code32 path. That breaks booting with 64bit bootloader that does not go through code32 path and jump to startup_64 directly, and it has different GDT. Move those lines into code32 after their GDT is loaded. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-21-git-send-email-yinghai@kernel.org Cc: Zachary Amsden <zamsden@gmail.com> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:19 -08:00
Yinghai Lu	187a8a73ce	x86, boot: Move verify_cpu.S and no_longmode down We need to move some code to 32bit section in following patch: x86, boot: Move lldt/ltr out of 64bit code section but that will push startup_64 down from 0x200. According to hpa, we can not change startup_64 position and that is an ABI. We could move function verify_cpu and no_longmode down, because verify_cpu is used via function call and no_longmode will not return, then we don't need to add extra code for jumping back. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-20-git-send-email-yinghai@kernel.org Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:15 -08:00
Yinghai Lu	3db07e70f0	x86, boot: Pass cmd_line_ptr with unsigned long instead boot/compressed/misc.c is used for bzImage in 64bit and 32bit, and cmd_line_ptr could point to buffer that is above 4g, cmd_line_ptr should be 64bit otherwise high 32bit will be capped out. So need to change data type to unsigned long, that will be 64bit get correct address of command line buffer. And it is still ok with 32bit bzImage, because unsigned long on 32bit kernel is still 32bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-19-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:09 -08:00
Yinghai Lu	16a4baa642	x86, boot: Move checking of cmd_line_ptr out of common path cmdline.c::__cmdline_find_option... are shared between 16-bit setup code and 32/64 bit decompressor code. for 32/64 only path via kexec, we should not check if ptr is less 1M. as those cmdline could be put above 1M, or even 4G. Move out accessible checking out of __cmdline_find_option() So decompressor in misc.c can parse cmdline correctly. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-18-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:26:01 -08:00
Yinghai Lu	f1da834cd9	x86, boot: Add get_cmd_line_ptr() Add an accessor function for the command line address. Later we will add support for holding a 64-bit address via ext_cmd_line_ptr. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:25:45 -08:00
Yinghai Lu	a8a51a88d5	x86: Add get_ramdisk_image/size() There are several places to find ramdisk information early for reserving and relocating. Use accessor functions to make code more readable and consistent. Later will add ext_ramdisk_image/size in those functions to support loading ramdisk above 4g. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-16-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:21:05 -08:00

... 2 3 4 5 6 ...

17180 Commits