Commit Graph

268 Commits

Author SHA1 Message Date
Dou Liyang
11277aabcb x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
The following commit:

  2e63ad4bd5 ("x86/apic: Do not init irq remapping if ioapic is disabled")

... added a check for skipped IO-APIC setup to enable_IR_x2apic(), but this
check is also duplicated in try_to_enable_IR() - and it will never succeed in
calling irq_remapping_enable().

Remove the whole irq_remapping_enable() complication: if the IO-APIC is
disabled we cannot enable IRQ remapping.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: nicstange@gmail.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1487841401-1543-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 10:09:09 +01:00
Dou Liyang
bb3f0a5263 x86/apic: Fix a warning message in logical CPU IDs allocation
The current warning message in allocate_logical_cpuid() is somewhat confusing:

  Only 1 processors supported.Processor 2/0x2 and the rest are ignored.

As it might imply that there's only one CPU in the system - while what we ran
into here is a kernel limitation.

Fix the warning message to clarify all that:

  APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 and the rest are ignored.

( Also update the error return from -1 to -EINVAL, which is the more
  canonical return value. )

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: nicstange@gmail.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1488261052-25753-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 10:09:08 +01:00
Linus Torvalds
f89db789de Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two documentation updates, plus a debugging annotation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Update the stale comment in reserve_crashkernel()
  x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
  Documentation, x86, resctrl: Recommend locking for resctrlfs
2017-02-28 11:46:00 -08:00
Dou Liyang
543113d2f4 x86/apic: Fix a typo in a comment line
s/bringin
 /bringing

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1486442688-24690-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07 10:40:27 +01:00
Frederic Weisbecker
914122c389 x86/apic: Implement set_state_oneshot_stopped() callback
When clock_event_device::set_state_oneshot_stopped() is not implemented,
hrtimer_cancel() can't stop the clock when there is no more timer in
the queue. So the ghost of the freshly cancelled hrtimer haunts us back
later with an extra interrupt:

          <idle>-0     [002] d..2  2248.557659: hrtimer_cancel: hrtimer=ffff88021fa92d80
          <idle>-0     [002] d.h1  2249.303659: local_timer_entry: vector=239

So let's implement this missing callback for the lapic clock. This
consist in calling its set_state_shutdown() callback. There don't seem
to be a lighter way to stop the clock. Simply writing 0 to APIC_TMICT
won't be enough to stop the clock and avoid the extra interrupt, as
opposed to what is specified in the specs. We must also mask the
timer interrupt in the device.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1483029949-6925-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 11:48:42 +01:00
Dou Liyang
12bf98b91f x86/apic: Fix typos in comments
s/ID/IDs/
 s/inr_logical_cpuidi/nr_logical_cpuids/
 s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1483610083-24314-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-06 08:40:33 +01:00
Daniel Bristot de Oliveira
c4158ff536 x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
This patch adds the __irq_entry annotation to the default x86
platform IRQ handlers. ftrace's function_graph tracer uses the
__irq_entry annotation to notify the entry and return of IRQ
handlers.

For example, before the patch:
  354549.667252 |   3)  d..1              |  default_idle_call() {
  354549.667252 |   3)  d..1              |    arch_cpu_idle() {
  354549.667253 |   3)  d..1              |      default_idle() {
  354549.696886 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  354549.696886 |   3)  d..1              |          irq_enter() {
  354549.696886 |   3)  d..1              |            rcu_irq_enter() {

After the patch:
  366416.254476 |   3)  d..1              |    arch_cpu_idle() {
  366416.254476 |   3)  d..1              |      default_idle() {
  366416.261566 |   3)  d..1  ==========> |
  366416.261566 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  366416.261566 |   3)  d..1              |          irq_enter() {
  366416.261566 |   3)  d..1              |            rcu_irq_enter() {

KASAN also uses this annotation. The smp_apic_timer_interrupt()
was already annotated.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-05 08:58:49 +01:00
Thomas Gleixner
9d85eb9119 x86/smpboot: Make logical package management more robust
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3b ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: M. Vefa Bicakci <m.v.b@runbox.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Cc: Charles (Chas) Williams <ciwillia@brocade.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-13 10:22:39 +01:00
Linus Torvalds
212f30008a Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 idle updates from Ingo Molnar:
 "There were two bigger changes in this development cycle:

   - remove idle notifiers:

       32 files changed, 74 insertions(+), 803 deletions(-)

     These notifiers were of questionable value and the main usecase,
     the i7300 driver, was essentially unmaintained and can be removed,
     plus modern power management concepts don't need the callback - so
     use this golden opportunity and get rid of this opaque and fragile
     callback from a latency sensitive code path.

     (Len Brown, Thomas Gleixner)

   - improve the AMD Erratum 400 workaround that used high overhead MSR
     polling in the idle loop (Borisla Petkov, Thomas Gleixner)"

* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove empty idle.h header
  x86/amd: Simplify AMD E400 aware idle routine
  x86/amd: Check for the C1E bug post ACPI subsystem init
  x86/bugs: Separate AMD E400 erratum and C1E bug
  x86/cpufeature: Provide helper to set bugs bits
  x86/idle: Remove enter_idle(), exit_idle()
  x86: Remove x86_test_and_clear_bit_percpu()
  x86/idle: Remove is_idle flag
  x86/idle: Remove idle_notifier
  i7300_idle: Remove this driver
2016-12-12 14:55:04 -08:00
Thomas Gleixner
34bc3560c6 x86: Remove empty idle.h header
One include less is always a good thing(tm). Good riddance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:22 +01:00
Borislav Petkov
07c94a3812 x86/amd: Simplify AMD E400 aware idle routine
Reorganize the E400 detection now that we have everything in place:
switch the CPUs to broadcast mode after the LAPIC has been initialized
and remove the facilities that were used previously on the idle path.

Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
because alternatives have been applied when the actual detection happens,
so the static switching does not take effect and the test will stay
false. Use boot_cpu_has_bug() instead which is definitely an improvement
over the RDMSR and the cpumask handling.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:21 +01:00
Wanpeng Li
8ca225520e x86/apic: Prevent tracing on apic_msr_write_eoi()
The following RCU lockdep warning led to adding irq_enter()/irq_exit() into
smp_reschedule_interrupt():

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 no locks held by swapper/1/0.
 
  do_trace_write_msr
  native_write_msr
  native_apic_msr_eoi_write
  smp_reschedule_interrupt
  reschedule_interrupt

As Peterz pointed out:

| So now we're making a very frequent interrupt slower because of debug 
| code.
|
| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only sent to tickle the
| return to user path (which does the actual preemption).
| 
| Having to do the whole irq_enter/irq_exit dance just for this unlikely
| debug case totally blows.

Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the
kvm variant with notrace and add a native_apic_eoi callback to the apic
structure so KVM guests are covered as well.

This allows to revert the irq_enter/irq_exit dance in
smp_reschedule_interrupt().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Mike Galbraith <efault@gmx.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 22:03:14 +01:00
Thomas Gleixner
df610d6788 x86/apic: Prevent pointless warning messages
Markus reported that he sees new warnings:

  APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
  APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.

This comes from the recent persistant cpuid - nodeid changes. The code
which emits the warning has been called prior to these changes only for
enabled processors. Now it's called for disabled processors as well to get
the possible cpu accounting correct. So if the kernel is compiled for the
number of actual available/enabled CPUs and the BIOS reports disabled CPUs
as well then the above warnings are printed.

That's a pointless exercise as it only makes sense if there are more CPUs
enabled than the kernel supports.

Nake the warning conditional on enabled processors so we are back to the
state before these changes.

Fixes: 8f54969dc8 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: linux-acpi@vger.kernel.org
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-08 12:18:36 +02:00
Thomas Gleixner
eb6296dec1 x86/apic: Fix silent & fatal merge conflict in __generic_processor_info()
Fix up the silent merge conflict between commit c291b01515 in x86/urgent
and commit f7c28833c2 in x86/apic which both remove num_processors++
from the original location and then add it at two different locations. As a
result num_processors is incremented twice which can cut the number of
available cpus in half.

Remove the one which is added by commit c291b01515.

In hindsight I should have merged x86/urgent into x86/apic _before_ adding
the nodeid bits, but in hindsight we are always smarter.

Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: 1e1b37273c ("Merge branch 'x86/urgent' into x86/apic")
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1609261350090.5483@nanos
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26 15:51:22 -04:00
Thomas Gleixner
1e1b37273c Merge branch 'x86/urgent' into x86/apic
Bring in the upstream modifications so we can fixup the silent merge
conflict which is introduced by this merge.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26 15:47:03 -04:00
Gu Zheng
8f54969dc8 x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21 21:18:38 +02:00
Gu Zheng
f7c28833c2 x86/acpi: Enable acpi to register all possible cpus at boot time
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21 21:18:38 +02:00
Denys Vlasenko
cff9ab2b29 x86/apic: Get rid of apic_version[] array
The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.

The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.

There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.

[ tglx: Massaged changelog ]

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-20 00:31:19 +02:00
Dou Liyang
c291b01515 x86/apic: Fix num_processors value in case of failure
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.

Only increase num_processors once we are sure we don't fail.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:11:03 +02:00
Wei Jiangang
5035da4199 x86/apic: Update comment about disabling processor focus
Fix references to discarded end_level_ioapic_irq().

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1471576957-12961-2-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:24:33 +02:00
Wanpeng Li
2e63ad4bd5 x86/apic: Do not init irq remapping if ioapic is disabled
native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping		  

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-24 09:45:40 +02:00
Baoquan He
6de421198c x86/apic, ACPI: Remove the repeated lapic address override entry parsing
The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:53:37 +02:00
Ingo Molnar
fdbdfefbab Merge branch 'linus' into timers/urgent, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:36:23 +02:00
Nicolai Stange
6731b0d611 x86/timers/apic: Inform TSC deadline clockevent device about recalibration
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.

The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().

This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".

Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.

Call it from the TSC code after TSC calibration refinement has happened.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:38:12 +02:00
Nicolai Stange
1a9e4c564a x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
I noticed the following bug/misbehavior on certain Intel systems: with a
single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum. (!)

Further tracing showed that the first one precedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.

The reason for this is imprecise calibration, the timeout we program into
the APIC results in 'too short' timer interrupts. The core (hr)timer code
notices this (because it has a precise ktime source and sees the short
interrupt) and fixes it up by programming an additional very short
interrupt period.

This is obviously suboptimal.

The reason for the imprecise calibration is twofold, and this patch
fixes the first reason:

In setup_APIC_timer(), the registered clockevent device's frequency
is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
it with 1000 afterwards:

  (tsc_khz / TSC_DIVISOR) * 1000

The multiplication with 1000 is done for converting from kHz to Hz and the
division by TSC_DIVISOR is carried out in order to make sure that the final
result fits into an u32.

However, with the order given in this calculation, the roundoff error
introduced by the division gets magnified by a factor of 1000 by the
following multiplication.

To fix it, reversing the order of the division and the multiplication a la:

  (tsc_khz * 1000) / TSC_DIVISOR

... reduces the roundoff error already.

Furthermore, if TSC_DIVISOR divides 1000, associativity holds:

  (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)

and thus, the roundoff error even vanishes and the whole operation can be
carried out within 32 bits.

The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
TSC_DIVISOR still allows for TSC frequencies up to
2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
in the next years.

Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
the order of the divison and the multiplication in the calculation of
the registered clockevent device's frequency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
[ Improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:37:38 +02:00
Masahiro Yamada
97f2645f35 tree-wide: replace config_enabled() with IS_ENABLED()
The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

 - config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

 - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will Drewry <wad@chromium.org>
Cc: Nikolay Martynov <mar.kolya@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Rafal Milecki <zajec5@gmail.com>
Cc: James Cowgill <James.Cowgill@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tony Wu <tung7970@gmail.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Linus Torvalds
aeb35d6b74 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header cleanups from Ingo Molnar:
 "This tree is a cleanup of the x86 tree reducing spurious uses of
  module.h - which should improve build performance a bit"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
  x86/apic: Remove duplicated include from probe_64.c
  x86/ce4100: Remove duplicated include from ce4100.c
  x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
  x86/platform: Delete extraneous MODULE_* tags fromm ts5500
  x86: Audit and remove any remaining unnecessary uses of module.h
  x86/kvm: Audit and remove any unnecessary uses of module.h
  x86/xen: Audit and remove any unnecessary uses of module.h
  x86/platform: Audit and remove any unnecessary uses of module.h
  x86/lib: Audit and remove any unnecessary uses of module.h
  x86/kernel: Audit and remove any unnecessary uses of module.h
  x86/mm: Audit and remove any unnecessary uses of module.h
  x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-08-01 14:23:42 -04:00
Linus Torvalds
08fd8c1768 xen: features and fixes for 4.8-rc0
- ACPI support for guests on ARM platforms.
 - Generic steal time support for arm and x86.
 - Support cases where kernel cpu is not Xen VCPU number (e.g., if
   in-guest kexec is used).
 - Use the system workqueue instead of a custom workqueue in various
   places.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXmLlrAAoJEFxbo/MsZsTRvRQH/1wOMF8BmlbZfR7H3qwDfjst
 ApNifCiZE08xDtWBlwUaBFAQxyflQS9BBiNZDVK0sysIdXeOdpWV7V0ZjRoLL+xr
 czsaaGXDcmXxJxApoMDVuT7FeP6rEk6LVAYRoHpVjJjMZGW3BbX1vZaMW4DXl2WM
 9YNaF2Lj+rpc1f8iG31nUxwkpmcXFog6ct4tu7HiyCFT3hDkHt/a4ghuBdQItCkd
 vqBa1pTpcGtQBhSmWzlylN/PV2+NKcRd+kGiwd09/O/rNzogTMCTTWeHKAtMpPYb
 Cu6oSqJtlK5o0vtr0qyLSWEGIoyjE2gE92s0wN3iCzFY1PldqdsxUO622nIj+6o=
 =G6q3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Features and fixes for 4.8-rc0:

   - ACPI support for guests on ARM platforms.
   - Generic steal time support for arm and x86.
   - Support cases where kernel cpu is not Xen VCPU number (e.g., if
     in-guest kexec is used).
   - Use the system workqueue instead of a custom workqueue in various
     places"

* tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
  xen: add static initialization of steal_clock op to xen_time_ops
  xen/pvhvm: run xen_vcpu_setup() for the boot CPU
  xen/evtchn: use xen_vcpu_id mapping
  xen/events: fifo: use xen_vcpu_id mapping
  xen/events: use xen_vcpu_id mapping in events_base
  x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
  x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
  xen: introduce xen_vcpu_id mapping
  x86/acpi: store ACPI ids from MADT for future usage
  x86/xen: update cpuid.h from Xen-4.7
  xen/evtchn: add IOCTL_EVTCHN_RESTRICT
  xen-blkback: really don't leak mode property
  xen-blkback: constify instance of "struct attribute_group"
  xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
  xen-blkback: prefer xenbus_scanf() over xenbus_gather()
  xen: support runqueue steal time on xen
  arm/xen: add support for vm_assist hypercall
  xen: update xen headers
  xen-pciback: drop superfluous variables
  xen-pciback: short-circuit read path used for merging write values
  ...
2016-07-27 11:35:37 -07:00
Vitaly Kuznetsov
3e9e57fad3 x86/acpi: store ACPI ids from MADT for future usage
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:30:53 +01:00
Paul Gortmaker
186f43608a x86/kernel: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed some implicit header usage that was fixed up accordingly.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:41 +02:00
Claudio Fontana
3c8fad9183 x86/apic: Fix misspelled APIC
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1465468318-19867-1-git-send-email-hw.claudio@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-10 14:48:24 +02:00
Borislav Petkov
93984fbd4e x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org
Cc: linux-pm@vger.kernel.org
Cc: oprofile-list@lists.sf.net
Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
59e21e3d00 x86/cpufeature: Replace cpu_has_tsc with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Link: http://lkml.kernel.org/r/1459801503-15600-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
62436a4d36 x86/cpufeature: Remove cpu_has_x2apic
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:08 +02:00
Ingo Molnar
00f5268501 Merge branch 'x86/cleanups' into x86/urgent
Pull in some merge window leftovers.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:44:57 +01:00
Thomas Gleixner
1f12e32f4c x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Adam Buchbinder
6a6256f9e0 x86: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:44:58 +01:00
Hidehiro Kawai
b7c4948e98 x86/apic: Introduce apic_extnmi command line parameter
This patch introduces a command line parameter apic_extnmi:

 apic_extnmi=( bsp|all|none )

The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.

"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.

If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bandan Das <bsd@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Juergen Gross
42baa2581c x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
Saving and restoring lapic vectors in lapic_suspend() and
lapic_resume() is not consistent: the thmr vector saving is
guarded by a different config option than the restore part. The
cmci vector isn't handled at all.

Those inconsistencies are not very critical, as the missing cmci
vector will be set via mce resume handling, the wrong config
option used for restoring the thmr vector can't be configured
differently than the one which should be used.

Nevertheless correct the thmr vector restore and add cmci vector
handling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448276364-31334-1-git-send-email-jgross@suse.com
[ Minor code edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:18:33 +01:00
Denys Vlasenko
d786ad32c3 x86/apic: Deinline various functions
__x2apic_disable: 178 bytes, 3 calls
__x2apic_enable: 117 bytes, 3 calls
__smp_spurious_interrupt: 110 bytes, 2 calls
__smp_error_interrupt: 208 bytes, 2 calls

Reduces code size by about 850 bytes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1443559022-23793-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30 21:15:53 +02:00
Shaohua Li
5d7c631d92 x86/apic: Serialize LVTT and TSC_DEADLINE writes
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.

The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.

xAPIC:
 "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
  2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
  3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
  4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."

x2APIC:
 "To allow for efficient access to the APIC registers in x2APIC mode,
  the serializing semantics of WRMSR are relaxed when writing to the
  APIC registers. Thus, system software should not use 'WRMSR to APIC
  registers in x2APIC mode' as a serializing instruction. Read and write
  accesses to the APIC registers will occur in program order. A WRMSR to
  an APIC register may complete before all preceding stores are globally
  visible; software can prevent this by inserting a serializing
  instruction, an SFENCE, or an MFENCE before the WRMSR."

The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.

Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.

[ tglx: Massaged the changelog ]

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-14 18:29:59 +02:00
Linus Torvalds
43af9872f5 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This udpate contains:

   - rework the irq vector array to store a pointer to the irq
     descriptor instead of the irq number to avoid a lookup of the irq
     descriptor in the irq entry path

   - lguest interrupt handling cleanups

   - conversion of the local apic timer to the new clockevent callbacks

   - preparatory changes for the irq argument removal of interrupt flow
     handlers"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Do not dereference irq descriptor before checking it
  tools/lguest: Clean up include dir
  tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap
  x86/irq: Store irq descriptor in vector array
  genirq: Provide irq_desc_has_action
  x86/irq: Get rid of an indentation level
  x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
  x86/irq: Replace numeric constant
  x86/irq: Protect smp_cleanup_move
  x86/lguest: Do not setup unused irq vectors
  x86/lguest: Clean up lguest_setup_irq
  x86/apic: Drop local_irq_save/restore in timer callbacks
  x86/apic: Migrate apic timer to new set_state interface
  x86/irq: Use access helper irq_data_get_affinity_mask()
  x86/irq: Use accessor irq_data_get_irq_handler_data()
  x86/irq: Use accessor irq_data_get_node()
2015-09-01 15:20:51 -07:00
Linus Torvalds
5778077d03 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
     primitives.  (Andy Lutomirski)

   - Add new, comprehensible entry and exit handlers written in C.
     (Andy Lutomirski)

   - vm86 mode cleanups and fixes.  (Brian Gerst)

   - 32-bit compat code cleanups.  (Brian Gerst)

  The amount of simplification in low level assembly code is already
  palpable:

     arch/x86/entry/entry_32.S                          | 130 +----
     arch/x86/entry/entry_64.S                          | 197 ++-----

  but more simplifications are planned.

  There's also the usual laudry mix of low level changes - see the
  changelog for details"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
  x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
  x86/asm/msr: Make wrmsrl() a function
  x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
  x86/asm: Add MONITORX/MWAITX instruction support
  x86/traps: Weaken context tracking entry assertions
  x86/asm/tsc: Add rdtscll() merge helper
  selftests/x86: Add syscall_nt selftest
  selftests/x86: Disable sigreturn_64
  x86/vdso: Emit a GNU hash
  x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
  x86/entry/32: Migrate to C exit path
  x86/entry/32: Remove 32-bit syscall audit optimizations
  x86/vm86: Rename vm86->v86flags and v86mask
  x86/vm86: Rename vm86->vm86_info to user_vm86
  x86/vm86: Clean up vm86.h includes
  x86/vm86: Move the vm86 IRQ definitions to vm86.h
  x86/vm86: Use the normal pt_regs area for vm86
  x86/vm86: Eliminate 'struct kernel_vm86_struct'
  x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
  x86/vm86: Move vm86 fields out of 'thread_struct'
  ...
2015-09-01 08:40:25 -07:00
Thomas Gleixner
a57e456a7b x86/apic: Fix fallout from x2apic cleanup
In the recent x2apic cleanup I got two things really wrong:
1) The safety check in __disable_x2apic which allows the function to
   be called unconditionally is backwards. The check is there to
   prevent access to the apic MSR in case that the machine has no
   apic. Though right now it returns if the machine has an apic and
   therefor the disabling of x2apic is never invoked.

2) x2apic_disable() sets x2apic_mode to 0 after registering the local
   apic. That's wrong, because register_lapic_address() checks x2apic
   mode and therefor takes the wrong code path.

This results in boot failures on machines with x2apic preenabled by
BIOS and can also lead to an fatal MSR access on machines without
apic.

The solutions are simple:
1) Correct the sanity check for apic availability
2) Clear x2apic_mode _before_ calling register_lapic_address()

Fixes: 659006bf3a 'x86/x2apic: Split enable and setup function'
Reported-and-tested-by: Javier Monteagudo <javiermon@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
Cc: stable@vger.kernel.org # 4.0+
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2015-08-22 17:01:48 +02:00
Thomas Gleixner
c948c26048 x86/apic: Drop local_irq_save/restore in timer callbacks
These callbacks are called with interrupts disabled from the core
code. Fixup the local caller to disable interrupts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
2015-07-30 00:51:47 +02:00
Viresh Kumar
b23d8e5278 x86/apic: Migrate apic timer to new set_state interface
Migrate apic driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything while switching to resume mode and so that
callback isn't implemented.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Bandan Das <bsd@redhat.com>
Link: http://lkml.kernel.org/r/1896ac5989d27f2ac37f4786af9bd537e1921b83.1437042675.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30 00:51:47 +02:00
Andy Lutomirski
4ea1636b04 x86/asm/tsc: Rename native_read_tsc() to rdtsc()
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:28 +02:00
Andy Lutomirski
87be28aaf1 x86/asm/tsc: Replace rdtscll() with native_read_tsc()
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:23:26 +02:00
Bandan Das
4399c03c67 x86/apic: Remove verify_local_APIC()
__verify_local_APIC() is detritus from the early APIC days.
Its return value isn't used anywhere and the information it
prints when debug is enabled is already part of APIC
initialization messages printed to syslog. Off with it!

Signed-off-by: Bandan Das <bsd@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/jpgy4mcsxsq.fsf@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 10:47:57 +02:00
Linus Torvalds
8329aa9fff Revert "x86/apic: Only disable CPU x2apic mode when necessary"
This reverts commit 5fcee53ce7.

It causes the suspend to fail on at least the Chromebook Pixel, possibly
other platforms too.

Joerg Roedel points out that the logic should probably have been

                if (max_physical_apicid > 255 ||
                    !(IS_ENABLED(CONFIG_HYPERVISOR_GUEST) &&
                      hypervisor_x2apic_available())) {

instead, but since the code is not in any fast-path, so we can just live
without that optimization and just revert to the original code.

Acked-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 10:26:18 -08:00