Commit Graph

616809 Commits

Author SHA1 Message Date
Nicolai Stange
6731b0d611 x86/timers/apic: Inform TSC deadline clockevent device about recalibration
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.

The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().

This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".

Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.

Call it from the TSC code after TSC calibration refinement has happened.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:38:12 +02:00
Nicolai Stange
1a9e4c564a x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
I noticed the following bug/misbehavior on certain Intel systems: with a
single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum. (!)

Further tracing showed that the first one precedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.

The reason for this is imprecise calibration, the timeout we program into
the APIC results in 'too short' timer interrupts. The core (hr)timer code
notices this (because it has a precise ktime source and sees the short
interrupt) and fixes it up by programming an additional very short
interrupt period.

This is obviously suboptimal.

The reason for the imprecise calibration is twofold, and this patch
fixes the first reason:

In setup_APIC_timer(), the registered clockevent device's frequency
is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
it with 1000 afterwards:

  (tsc_khz / TSC_DIVISOR) * 1000

The multiplication with 1000 is done for converting from kHz to Hz and the
division by TSC_DIVISOR is carried out in order to make sure that the final
result fits into an u32.

However, with the order given in this calculation, the roundoff error
introduced by the division gets magnified by a factor of 1000 by the
following multiplication.

To fix it, reversing the order of the division and the multiplication a la:

  (tsc_khz * 1000) / TSC_DIVISOR

... reduces the roundoff error already.

Furthermore, if TSC_DIVISOR divides 1000, associativity holds:

  (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)

and thus, the roundoff error even vanishes and the whole operation can be
carried out within 32 bits.

The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
TSC_DIVISOR still allows for TSC frequencies up to
2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
in the next years.

Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
the order of the divison and the multiplication in the calculation of
the registered clockevent device's frequency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
[ Improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:37:38 +02:00
Benjamin Herrenschmidt
97f6e0cc35 powerpc/32: Fix crash during static key init
We cannot do those initializations from apply_feature_fixups() as
this function runs in a very restricted environment on 32-bit where
the kernel isn't running at its linked address and the PTRRELOC()
macro must be used for any global accesss.

Instead, split them into a separtate steup_feature_keys() function
which is called in a more suitable spot on ppc32.

Fixes: 309b315b6e ("powerpc: Call jump_label_init() in apply_feature_fixups()")
Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:41:58 +10:00
Dave Gordon
774439e12b drm/i915/guc: re-optimise i915_guc_client layout
As we're tweaking the GuC-related code in debugfs, we can
drop the no-longer-used 'q_fail' and repack the structure.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-10 10:40:05 +01:00
Dave Gordon
c18468c4b2 drm/i915/guc: use for_each_engine_id() where appropriate
Now that host structures are indexed by host engine-id rather than
guc_id, we can usefully convert some for_each_engine() loops to use
for_each_engine_id() and avoid multiple dereferences of engine->id.

Also a few related tweaks to cache structure members locally wherever
they're used more than once or twice, hopefully eliminating memory
references.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-10 10:40:05 +01:00
Dave Gordon
e02757d91f drm/i915/guc: add engine mask to GuC client & pass to GuC
The Context Descriptor passed by the kernel to the GuC contains a field
specifying which engine(s) the context will use. Historically, this was
always set to "all of them", but if we had a separate client for each
engine, we could be more precise, and set only the bit for the engine
that the client was associated with. So this patch enables this usage,
in preparation for having multiple clients, though at this point there
is still only a single client used for all supported engines.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-10 10:40:05 +01:00
Dave Gordon
84b7f88235 drm/i915/guc: refactor guc_init_doorbell_hw()
We have essentially the same code in each of two different
loops, so we can refactor it into a little helper function.

This also reduces the amount of work done during startup,
as we now only reprogram h/w found to be in a state other
than that expected, and so avoid the overhead of setting
doorbell registers to the state they're already in.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-10 10:40:05 +01:00
Dave Gordon
8888cd0154 drm/i915/guc: doorbell reset should avoid used doorbells
guc_init_doorbell_hw() borrows the (currently single) GuC client to use
in reinitialising ALL the doorbell registers (as the hardware doesn't
reset them when the GuC is reset). As a prerequisite for accommodating
multiple clients, it should only reset doorbells that are supposed to be
disabled, avoiding those that are marked as in use by any client.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-08-10 10:40:05 +01:00
Benjamin Herrenschmidt
f9cc1d1f80 powerpc: Update obsolete comment in setup_32.c about early_init()
We don't identify the machine type anymore...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:38:02 +10:00
Chris Wilson
dbd6ef29a7 drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
The bottom-half we use for processing the breadcrumb interrupt is a
task, which is an RCU protected struct. When accessing this struct, we
need to be holding the RCU read lock to prevent it disappearing beneath
us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
being under RCU guard and then use the RCU accessors to both provide
correct ordering of access through the pointer.

Most notably, this fixes the access from hard irq context to use the RCU
read lock, which both Daniel and Tvrtko complained about.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk
2016-08-10 10:37:49 +01:00
Benjamin Herrenschmidt
7d70c63c71 powerpc: Print the kernel load address at the end of prom_init()
This makes it easier to debug crashes that happen very early before
the kernel takes over Open Firmware by allowing us to relate the OF
reported crashing addresses to offsets within the kernel.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:37:43 +10:00
Chris Wilson
83348ba84e drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
In commit 2529d57050 ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050 (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
2016-08-10 10:37:35 +01:00
Chris Wilson
70cb472c6d drm/i915: Always mark the writer as also a read for busy ioctl
One of the few guarantees we want the busy ioctl to provide is that the
reported busy writer is included in the set of busy read engines. This
should be provided by the ordering of setting and retiring the active
trackers, but we can do better by explicitly setting the busy read
engine flag for the last writer.

v2: More comments inside __busy_write_id() to explain why both fields
are set.

Fixes: 3fdc13c7a3 ("drm/i915: Remove (struct_mutex) locking for busy-ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-10 10:37:21 +01:00
Cyril Bur
c7a318ba86 powerpc/ptrace: Fix coredump since ptrace TM changes
Commit 8d460f6156 ("powerpc/process: Add the function
flush_tmregs_to_thread") added flush_tmregs_to_thread() and included
the assumption that it would only be called for a task which is not
current.

Although this is correct for ptrace, when generating a core dump, some
of the routines which call flush_tmregs_to_thread() are called. This
leads to a WARNing such as:

  Not expecting ptrace on self: TM regs may be incorrect
  ------------[ cut here ]------------
  WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80
  CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1
  task: c000000fe631b600 task.stack: c000000fe63b0000
  NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780
  REGS: c000000fe63b3420 TRAP: 0700   Not tainted  (4.8.0-rc1-gcc6x-g61e8a0d)
  MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28004222  XER: 20000000
  ...
  NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80
  LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80
  Call Trace:
   flush_tmregs_to_thread+0x74/0x80 (unreliable)
   vsr_get+0x64/0x1a0
   elf_core_dump+0x604/0x1430
   do_coredump+0x5fc/0x1200
   get_signal+0x398/0x740
   do_signal+0x54/0x2b0
   do_notify_resume+0x98/0xb0
   ret_from_except_lite+0x70/0x74

So fix flush_tmregs_to_thread() to detect the case where it is called on
current, and a transaction is active, and in that case flush the TM regs
to the thread_struct.

This patch also moves flush_tmregs_to_thread() into ptrace.c as it is
only called from that file.

Fixes: 8d460f6156 ("powerpc/process: Add the function flush_tmregs_to_thread")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 16:34:20 +10:00
Christophe Leroy
1bc8b816cb powerpc/32: Fix csum_partial_copy_generic()
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination
address is odd and initial csum is not null

In that (rare) case the initial csum value has to be rotated one byte
as well as the resulting value is

This patch also fixes related comments

Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 14:52:45 +10:00
Frederic Barrat
c6d2ee09c2 cxl: Set psl_fir_cntl to production environment value
Switch the setting of psl_fir_cntl from debug to production
environment recommended value. It mostly affects the PSL behavior when
an error is raised in psl_fir1/2.

Tested with cxlflash.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 14:02:45 +10:00
Konstantin Khlebnikov
51350ea0d7 mm, writeback: flush plugged IO in wakeup_flusher_threads()
I've found funny live-lock between raid10 barriers during resync and
memory controller hard limits. Inside mpage_readpages() task holds on to
its plug bio which blocks the barrier in raid10. Its memory cgroup have
no free memory thus the task goes into reclaimer but all reclaimable
pages are dirty and cannot be written because raid10 is rebuilding and
stuck on the barrier.

Common flush of such IO in schedule() never happens, because the caller
doesn't go to sleep.

Lock is 'live' because changing memory limit or killing tasks which
holds that stuck bio unblock whole progress.

That was what happened in 3.18.x but I see no difference in upstream
logic.  Theoretically this might happen even without memory cgroup.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-09 19:58:06 -06:00
Dave Gordon
aac440ff22 drm: avoid "possible bad bitmask?" warning
Recent versions of gcc say this:

include/drm/i915_drm.h:96:34: warning: result of ‘65535 << 20’
requires 37 bits to represent, but ‘int’ only has 32 bits
[-Wshift-overflow=]

Reported-by: David Binderman <linuxdev.baldrick@gmail.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470764110-23855-1-git-send-email-david.s.gordon@intel.com
2016-08-09 22:18:26 +02:00
Ingo Molnar
69766c40c3 perf/urgent fixes:
User visible fixes:
 
 - Fix the lookup for a kernel module in 'perf probe', fixing for instance, the
   erroneous return of "[raid10]" when looking for "[raid1]"  (Konstantin Khlebnikov)
 
 - Disable counters in a group before reading them in 'perf stat', to avoid skew (Mark Rutland)
 
 - Fix adding probes to function aliases in systems using kaslr (Masami Hiramatsu)
 
 - Trip libtraceevent trace_seq buffers, removing unnecessary memory usage that could
   bring a system using tracepoint events with 'perf top' to a crawl, as the trace_seq
   buffers start at a whooping 4 KB, which is very rarely used in perf's usecases,
   so realloc it to the really used space as a last measure after using libtraceevent
   functions to format the fields of tracepoint events (Arnaldo Carvalho de Melo)
 
 - Fix 'perf probe' location when using DWARF on ppc64le (Ravi Bangoria)
 
 Improvement:
 
 - Allow specifying signedness casts to a 'perf probe' variable, to shorten
   the number of steps to see signed values that otherwise would always appear
   as hex values (Naohiro Aota)
 
 Documentation fixes:
 
 - Add 'bpf-output' field to 'perf script' usage message (Brendan Gregg)
 
 Infrastructure fixes:
 
 - Sync kernel header files: cpufeatures.h, {disabled,required}-features.h,
   bpf.h and vmx.h, so that we get a clean build, without warnings about files
   being different from the kernel counterparts.
 
   A verification of the need or desirability of changes in tools/ based on what
   was done in the kernel changesets was made and documented in the respective
   file sync changesets (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXqfpSAAoJENZQFvNTUqpAkJUQALGxbMIZmCOE2O/lok7t6PDZ
 VUsq0vs3V1mPqdin1UnNsfMCvWQExeJZQ4P8EUTSLoMbpiifq2BY5696xs35LUtl
 +UTTtVgtN+/W5gLiQ78U8kEkG8Q/PiMeWKyLLKgBSAEtibC4pOLnQCu/g4DP3e3c
 oYQTaoaSq4eBQMQaDKF2Y6EVQFEdQs4PI1JUIsGn9zTfR5qtRiKwZxrNkAfmNAVO
 opDN42JR3HewwXiOKWuoAVHDi5QVsHgDUnPuYlFujbx306WV+EiypRpzA4Rbr0Cu
 AZrtkqQdSkKYVhEop3Az5kW9m3qZ6DRcZfJNVmD0Cax637gQNbOyIVVxo2KiS5JQ
 8kZknTuQoR8GTARUwlUlQVydwKaRXsox4M1o71FVAOuvEKzBFpIUF46c+Ljgj4O0
 zo1q9I2GnmxjakP0528oLrtT4UyndWTxjK0bwPcr+AwFGVfICT5OteUoTkLSbTAO
 WdqfIhtS1PL+yJmy9iQkaPtTWVtgHmJbQ8PtBVBZZjwvDbx2sDv84/83picwUM+u
 4g0WaqEzKMhRznDHXjc6EkSWRcP20AFqAotRNn8Uxhinx6OvLMs2C6TLzt1lHuw/
 Zsz6wJnM3qaOs9Oxe1U8J04Zm47C8y6iVx5zpJkgFaAf49mJ475me89NratFrAfq
 T1OsSy9dejfB4xrBvOjk
 =/B7T
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-20160809' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

User visible fixes:

- Fix the lookup for a kernel module in 'perf probe', fixing for instance, the
  erroneous return of "[raid10]" when looking for "[raid1]"  (Konstantin Khlebnikov)

- Disable counters in a group before reading them in 'perf stat', to avoid skew (Mark Rutland)

- Fix adding probes to function aliases in systems using kaslr (Masami Hiramatsu)

- Trip libtraceevent trace_seq buffers, removing unnecessary memory usage that could
  bring a system using tracepoint events with 'perf top' to a crawl, as the trace_seq
  buffers start at a whooping 4 KB, which is very rarely used in perf's usecases,
  so realloc it to the really used space as a last measure after using libtraceevent
  functions to format the fields of tracepoint events (Arnaldo Carvalho de Melo)

- Fix 'perf probe' location when using DWARF on ppc64le (Ravi Bangoria)

- Allow specifying signedness casts to a 'perf probe' variable, to shorten
  the number of steps to see signed values that otherwise would always appear
  as hex values (Naohiro Aota)

Documentation fixes:

- Add 'bpf-output' field to 'perf script' usage message (Brendan Gregg)

Infrastructure fixes:

- Sync kernel header files: cpufeatures.h, {disabled,required}-features.h,
  bpf.h and vmx.h, so that we get a clean build, without warnings about files
  being different from the kernel counterparts.

  A verification of the need or desirability of changes in tools/ based on what
  was done in the kernel changesets was made and documented in the respective
  file sync changesets (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-09 21:11:00 +02:00
Linus Torvalds
a0cba2179e Revert "printk: create pr_<level> functions"
This reverts commit 874f9c7da9.

Geert Uytterhoeven reports:
 "This change seems to have an (unintendent?) side-effect.

  Before, pr_*() calls without a trailing newline characters would be
  printed with a newline character appended, both on the console and in
  the output of the dmesg command.

  After this commit, no new line character is appended, and the output
  of the next pr_*() call of the same type may be appended, like in:

    - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
    - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
    + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"

Joe Perches says:
 "No, that is not intentional.

  The newline handling code inside vprintk_emit is a bit involved and
  for now I suggest a revert until this has all the same behavior as
  earlier"

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Requested-by: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09 10:48:18 -07:00
Linus Torvalds
84bd8d33a9 Luiz Capitulino noticed that the tick_stop tracepoint wasn't being parsed
properly by the tracing user space tools. This was due to the
 TRACE_DEFINE_ENUM() being set to a define, when it should have been set
 to the enum itself. The define was of the MASK that used the BIT to shift.
 The BIT was the enum and by adding that, everything gets converted nicely.
 The MASK is still kept just in case it gets converted to an enum in the
 future.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXqeA/AAoJEKKk/i67LK/8wLAH/0nD8L5pxtn+pZi3mQnWNwbn
 qEBtfKK8cvnt0IWH2HlKmRKLAiIJfp8UrkGoPiLT7Nb83PlKaw3UT868t7eDmknu
 b29SsZroMgvJ1MeeNH9Yzk7cK3/K213VO02P4ce8EWSELYCqFlxJDE3dhl52K9Lj
 clJSoZbIGTLlx4pk6zZnPksTt3Z9WZXcVJITwxEiz/Cr+CKAWZpLoPPUaqJOmA4j
 9oS6d++0DYZdz3cWCmYBBkphmc5IQkBNZWGMYLcAR+M+m5fsN+baWlP3Dhq4j7he
 WknHwu4WDFMk6a2Kh0Ggi6yUWVUIkLNY2Z4QUF2gNmJT1g/FH5lZka4/kIxjvKw=
 =rgBA
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix tick_stop tracepoint symbols for user export.

  Luiz Capitulino noticed that the tick_stop tracepoint wasn't being
  parsed properly by the tracing user space tools.

  This was due to the TRACE_DEFINE_ENUM() being set to a define, when it
  should have been set to the enum itself.  The define was of the MASK
  that used the BIT to shift.  The BIT was the enum and by adding that,
  everything gets converted nicely.  The MASK is still kept just in case
  it gets converted to an enum in the future"

* tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix tick_stop tracepoint symbols for user export
2016-08-09 10:34:09 -07:00
Linus Torvalds
b79f34d6ae Several fixes/improvements for the gcc plugin infrastructure:
- Fixes a problem with gcc plugins interfering with cc-option tests.
 - Aborts more gracefully when gcc plugin headers or compiler support is
   missing.
 - Improves the gcc plugin rule generation to be more dynamic, pass arguments,
   and build from subdirectories.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIbBAABCgAGBQJXqSsEAAoJEIly9N/cbcAmf+gP+LkBnio5aJrhQwsw2pJ44xC+
 XTYeypDJswjA2EZsCLJXRFkRr46/SDUXF9sE428o/i47udglOPDyvRtb4/yekBTC
 ao9INrXqmpfkwM2QAZRQH6bTDmxoJF4/u1Z+KgF0e2CX+23ZEUjNVuQwsGcBWGJ8
 ktvF3agyT/Of97scbiKmxmbyvGF4lqCdEUWr2Aq0kYd2XKRzKvDO5JvqB+Sz7uWQ
 ipy4GdcVgEJG6XyjirEyneqcH4Kp0XLf7pYV4V8cdBm1ORBx7igLVgm1mnPwIiAH
 xzlRLD/CHG8bLttvd/6iJ3RvTUYhoBFzfiijH4CTKEd/M7wWqzXhdQEGChvD1OCS
 DJIrZZZfWP+9IfPMdFBuq3eZ7O5wyroQf8n8jbF2Nql3qZfWy8CeNTwINxCivEHs
 PJnXMiug8yNuBBiXagKNlAVUb91ij/VzsQ0Zikh7wQB6odOi680p16xEUrvoNC5V
 H+zST1Ep+BI8O7WiYPH3YyoDZtIJcPGYLT4j0ZGY7U3UrPAgF5Wnu6pcCHDD6Azl
 zisde72+DWcUXTsIdFLvK1lvy4aC2lkgyqBwTvUQ8s4O0IDymywo+WacuCk1JURL
 5dkGsRT8Q31XOvJjp7rf4NZvB8blwHFkwTR4yB5qjiTVCl8x2d+xfq2tDxLw60MT
 d/R0RGahDEBddJwmN04=
 =Jwmy
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugin-infrastructure-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugin improvements from Kees Cook:
 "Several fixes/improvements for the gcc plugin infrastructure:

   - fix a problem with gcc plugins interfering with cc-option tests.

   - abort more gracefully when gcc plugin headers or compiler support
     is missing.

   - improve the gcc plugin rule generation to be more dynamic, pass
     arguments, and build from subdirectories"

* tag 'gcc-plugin-infrastructure-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: Add support for plugin subdirectories
  gcc-plugins: Automate make rule generation
  gcc-plugins: Add support for passing plugin arguments
  gcc-plugins: abort builds cleanly when not supported
  kbuild: no gcc-plugins during cc-option tests
2016-08-09 10:30:07 -07:00
Linus Torvalds
e1d009eab4 platform-drivers-x86 for 4.8-3
dell-wmi:
  - Ignore WMI event 0xe00e
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXqNDLAAoJEKbMaAwKp364NG8H/jmcdVk8QDGLVvTQJ624CsL6
 VWKEkIBj1HnxW7zukwXcPlypjccbHcGKG+Q98pwpKKBw2bHc3dVIpVhc1k2fVZdc
 ny5JNg8gJBwNUFTl+i4yzbOo9UAwSJApScc3l5c2x892/1v6x10IoVdjygD0uMOK
 P9mQ4z2w0D9kpeUpgjFwZrgdePlUKeykg4byGwGu/WyuEMisQE31QaJstDdyc/VE
 EqYpir2dQ4czAKgXOSG12xvTsvqdadyDhoh1ln7tmfMNkRjLAwHYidBdUBnjVcJ2
 tN1ppAc2gz1a/5yKxHEAz+jjqrx4t8CCbLIH83SFOH8GCmUH1XMhdLnzvp1NBbY=
 =UulL
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.8-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver update from Darren Hart:
 "dell-wmi: ignore battery remove/insert event"

* tag 'platform-drivers-x86-v4.8-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  dell-wmi: Ignore WMI event 0xe00e
2016-08-09 10:26:14 -07:00
Linus Torvalds
cb0d93aaf0 Merge tag 'drm-fixes-for-4.8-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "This contains a bunch of amdgpu fixes, and some i915 regression fixes.

  It also contains some fixes for an older regression with some EDID
  changes and some 6bpc panels.

  Then there are the lockdep, cirrus and rcar-du regression fixes from
  this window"

* tag 'drm-fixes-for-4.8-rc2' of git://people.freedesktop.org/~airlied/linux:
  drm/cirrus: Fix NULL pointer dereference when registering the fbdev
  drm/edid: Set 8 bpc color depth for displays with "DFP 1.x compliant TMDS".
  drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"
  drm/edid: Add 6 bpc quirk for display AEO model 0.
  drm: Paper over locking inversion after registration rework
  drm: rcar-du: Link HDMI encoder with bridge
  drm/ttm: Wait for a BO to become idle before unbinding it from GTT
  drm/i915/fbdev: Check for the framebuffer before use
  drm/amdgpu: update golden setting of polaris10
  drm/amdgpu: update golden setting of stoney
  drm/amdgpu: update golden setting of polaris11
  drm/amdgpu: update golden setting of carrizo
  drm/amdgpu: update golden setting of iceland
  drm/amd/amdgpu: change pptable output format from ASCII to binary
  drm/amdgpu/ci: add mullins to default case for smc ucode
  drm/amdgpu/gmc7: add missing mullins case
  drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
  drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
2016-08-09 10:20:21 -07:00
Brian King
a3d1ddd932 ipr: Fix sync scsi scan
Commit b195d5e2bf ("ipr: Wait to do async scan until scsi host is
initialized") fixed async scan for ipr, but broke sync scan for ipr.

This fixes sync scan back up.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09 10:17:42 -07:00
Vladimir Davydov
c4159a75b6 mm: memcontrol: only mark charged pages with PageKmemcg
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true.  The latter is set iff there are kmem-enabled memory cgroups
(online or offline).  The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

  # no memory cgroups has been created yet, create one
  mkdir /sys/fs/cgroup/memory/test
  # run something allocating pages with __GFP_ACCOUNT, e.g.
  # a program using pipe
  dmesg | tail
  # remove the memory cgroup
  rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page->_mapcount != -1:

  BUG: Bad page state in process swapper/0  pfn:1fd945c
  page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
  flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148ad4 ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09 10:14:10 -07:00
Marc Zyngier
7f1d642fbb drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property
Patch 19a469a587 ("drivers/perf: arm-pmu: Handle per-interrupt
affinity mask") added support for partitionned PPI setups, but
inadvertently broke setups using SPIs without the "interrupt-affinity"
property (which is the case for UP platforms).

This patch restore the broken functionnality by testing whether the
interrupt is percpu or not instead of relying on the using_spi flag
that really means "SPI *and* interrupt-affinity property".

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 19a469a587 ("drivers/perf: arm-pmu: Handle per-interrupt affinity mask")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-08-09 17:57:39 +01:00
Sudeep Holla
a026bb12cc drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock
arm_pmu_mutex is never held long and we don't want to sleep while the
lock is being held as it's executed in the context of hotplug notifiers.
So it can be converted to a simple spinlock instead.

Without this patch we get the following warning:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/2
no locks held by swapper/2/0.
irq event stamp: 381314
hardirqs last  enabled at (381313): _raw_spin_unlock_irqrestore+0x7c/0x88
hardirqs last disabled at (381314): cpu_die+0x28/0x48
softirqs last  enabled at (381294): _local_bh_enable+0x28/0x50
softirqs last disabled at (381293): irq_enter+0x58/0x78
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.7.0 #12
Call trace:
 dump_backtrace+0x0/0x220
 show_stack+0x24/0x30
 dump_stack+0xb4/0xf0
 ___might_sleep+0x1d8/0x1f0
 __might_sleep+0x5c/0x98
 mutex_lock_nested+0x54/0x400
 arm_perf_starting_cpu+0x34/0xb0
 cpuhp_invoke_callback+0x88/0x3d8
 notify_cpu_starting+0x78/0x98
 secondary_start_kernel+0x108/0x1a8

This patch converts the mutex to spinlock to eliminate the above
warnings. This constraints pmu->reset to be non-blocking call which is
the case with all the ARM PMU backends.

Cc: Stephen Boyd <sboyd@codeaurora.org>
Fixes: 37b502f121 ("arm/perf: Fix hotplug state machine conversion")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-08-09 17:33:38 +01:00
Lyude
9622c38f1c drm/dp_helper: Rate limit timeout errors from drm_dp_i2c_do_msg()
Timeouts can be errors, but timeouts are also usually normal behavior
and happen a lot. Since the kernel already lets us know when we're
suppressing messages due to rate limiting, rate limit timeout errors so
we don't make too much noise in the kernel log.

Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470443443-27252-8-git-send-email-cpaul@redhat.com
2016-08-09 18:23:44 +02:00
Lyude
29f21e0491 drm/dp_helper: Print first error received on failure in drm_dp_dpcd_access()
Since we always retry in drm_dp_dpcd_access() regardless of the error,
we're going to make a lot of noise if the aux->transfer function prints
it's own errors (as is the case with radeon). If we can print the error
code here, this reduces the need for drivers to do this. So instead of
having to print "dp_aux_ch timed out" over 32 times we can just print
once.

Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470443443-27252-2-git-send-email-cpaul@redhat.com
2016-08-09 18:23:44 +02:00
Lyude
27528c667a drm: Add ratelimited versions of the DRM_DEBUG* macros
There's a couple of places where this would be useful for drivers (such
as reporting DP aux transaction timeouts).

Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470443443-27252-7-git-send-email-cpaul@redhat.com
2016-08-09 18:23:43 +02:00
Chris Wilson
1426f7157e drm/i915: Correct typo for __i915_gem_active_get_rcu in a comment
I mistyped and added an extra _request_ to __i915_gem_active_get_rcu()
Also, the same happened to another comment for i915_gem_active_get_rcu()

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470758602-1338-1-git-send-email-chris@chris-wilson.co.uk
2016-08-09 17:17:56 +01:00
Ilya Dryomov
4eacd4cb3a ceph: initialize pathbase in the !dentry case in encode_caps_cb()
pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-09 17:26:56 +02:00
Ilya Dryomov
d8734849d8 rbd: nuke the 32-bit pool id check
ceph_file_layout::pool_id is now s64.  rbd_add_get_pool_id() and
ceph_pg_poolid_by_name() both return an int, so it's bogus anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-09 17:26:47 +02:00
Ravi Bangoria
99e608b595 perf probe ppc64le: Fix probe location when using DWARF
Powerpc has Global Entry Point and Local Entry Point for functions.  LEP
catches call from both the GEP and the LEP. Symbol table of ELF contains
GEP and Offset from which we can calculate LEP, but debuginfo does not
have LEP info.

Currently, perf prioritize symbol table over dwarf to probe on LEP for
ppc64le. But when user tries to probe with function parameter, we fall
back to using dwarf(i.e. GEP) and when function called via LEP, probe
will never hit.

For example:

  $ objdump -d vmlinux
    ...
    do_sys_open():
    c0000000002eb4a0:       e8 00 4c 3c     addis   r2,r12,232
    c0000000002eb4a4:       60 00 42 38     addi    r2,r2,96
    c0000000002eb4a8:       a6 02 08 7c     mflr    r0
    c0000000002eb4ac:       d0 ff 41 fb     std     r26,-48(r1)

  $ sudo ./perf probe do_sys_open
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060904

  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string

For second case, perf probed on GEP. So when function will be called via
LEP, probe won't hit.

  $ sudo ./perf record -a -e probe:do_sys_open ls
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.195 MB perf.data ]

To resolve this issue, let's not prioritize symbol table, let perf
decide what it wants to use. Perf is already converting GEP to LEP when
it uses symbol table. When perf uses debuginfo, let it find LEP offset
form symbol table. This way we fall back to probe on LEP for all cases.

After patch:

  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string

  $ sudo ./perf record -a -e probe:do_sys_open ls
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 12:14:29 -03:00
Ravi Bangoria
d820456dc7 perf probe: Add function to post process kernel trace events
Instead of inline code, introduce function to post process kernel
probe trace events.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 12:09:59 -03:00
Arnaldo Carvalho de Melo
840b49ba55 tools: Sync cpufeatures headers with the kernel
Due to:

  1e61f78baf ("x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated")

No changes to tools using those headers (tools/arch/x86/lib/mem{set,cpu}_64.S)
seems necessary.

Detected by the tools build header drift checker:

  $ make -C tools/perf O=/tmp/build/perf
  make: Entering directory '/home/acme/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
    GEN      /tmp/build/perf/common-cmds.h
  Warning: tools/arch/x86/include/asm/disabled-features.h differs from kernel
  Warning: tools/arch/x86/include/asm/required-features.h differs from kernel
  Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel
    CC       /tmp/build/perf/util/probe-finder.o
    CC       /tmp/build/perf/builtin-help.o
  <SNIP>
  ^C$

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ja75m7zk8j0jkzmrv16i5ehw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 11:56:33 -03:00
Arnaldo Carvalho de Melo
791cceb89f toops: Sync tools/include/uapi/linux/bpf.h with the kernel
The way we're using kernel headers in tools/ now, with a copy that is
made to the same path prefixed by "tools/" plus checking if that copy
got stale, i.e. if the kernel counterpart changed, helps in keeping
track with new features that may be useful for tools to exploit.

For instance, looking at all the changes to bpf.h since it was last
copied to tools/include brings this to toolers' attention:

Need to investigate this one to check how to run a program via perf, setting up
a BPF event, that will take advantage of the way perf already calls clang/LLVM,
sets up the event and runs the workload in a single command line, helping in
debugging such semi cooperative programs:

  96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")

This one needs further investigation about using the feature it improves
in 'perf trace' to do some tcpdumpin' mixed with syscalls, tracepoints,
probe points, callgraphs, etc:

  555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")

Add tracing just packets that are related to some container to that mix:

  4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
  4ed8ec521e ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")

Definetely needs to have example programs accessing task_struct from a bpf proggie
started from 'perf trace':

  606274c5ab ("bpf: introduce bpf_get_current_task() helper")

Core networking related, XDP:

  6ce96ca348 ("bpf: add XDP_TX xdp_action for direct forwarding")
  6a773a15a1 ("bpf: add XDP prog type for early driver filter")
  13c5c240f7 ("bpf: add bpf_get_hash_recalc helper")
  d2485c4242 ("bpf: add bpf_skb_change_type helper")
  6578171a7f ("bpf: add bpf_skb_change_proto helper")

Changes detected by the tools build system:

  $ make -C tools/perf O=/tmp/build/perf install-bin
  make: Entering directory '/home/acme/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
  Warning: tools/include/uapi/linux/bpf.h differs from kernel
    INSTALL  GTK UI
    CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  <SNIP>
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-difq4ts1xvww6eyfs9e7zlft@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 11:48:07 -03:00
Arnaldo Carvalho de Melo
bebfb73012 tools: Sync cpufeatures.h and vmx.h with the kernel
There were changes related to the deprecation of the "pcommit"
instruction:

  fd1d961dd6 ("x86/insn: remove pcommit")
  dfa169bbee ("Revert "KVM: x86: add pcommit support"")

No need to update anything in the tools, as "pcommit" wasn't being
listed on the VMX_EXIT_REASONS in the tools/perf/arch/x86/util/kvm-stat.c
file.

Just grab fresh copies of these files to silence the file cache
coherency detector:

  $ make -C tools/perf O=/tmp/build/perf install-bin
  make: Entering directory '/home/acme/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
  Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel
  Warning: tools/arch/x86/include/uapi/asm/vmx.h differs from kernel
    INSTALL  GTK UI
  <SNIP>
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-07pmcc1ysydhyyxbmp1vt0l4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 11:21:57 -03:00
Naohiro Aota
19f00b0117 perf probe: Support signedness casting
The 'perf probe' tool detects a variable's type and use the detected
type to add a new probe. Then, kprobes prints its variable in
hexadecimal format if the variable is unsigned and prints in decimal if
it is signed.

We sometimes want to see unsigned variable in decimal format (i.e.
sector_t or size_t). In that case, we need to investigate the variable's
size manually to specify just signedness.

This patch add signedness casting support. By specifying "s" or "u" as a
type, perf-probe will investigate variable size as usual and use the
specified signedness.

E.g. without this:

  $ perf probe -a 'submit_bio bio->bi_iter.bi_sector'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
  You can now use it in all perf tools, such as:
          perf record -e probe:submit_bio -aR sleep 1
  $ cat trace_pipe|head
          dbench-9692  [003] d..1   971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00
          dbench-9692  [003] d..1   971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80
          dbench-9692  [003] d..1   971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80
...
  // need to investigate the variable size
  $ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64)
  You can now use it in all perf tools, such as:
        perf record -e probe:submit_bio -aR sleep 1

  With this:

  // just use "s" to cast its signedness
  $ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
  You can now use it in all perf tools, such as:
          perf record -e probe:submit_bio -aR sleep 1
  $ cat trace_pipe|head
          dbench-9689  [001] d..1  1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128
          dbench-9689  [001] d..1  1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072
          dbench-9697  [006] d..1  1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208

  This commit also update perf-probe.txt to describe "types". Most parts
  are based on existing documentation: Documentation/trace/kprobetrace.txt

Committer note:

Testing using 'perf trace':

  # perf probe -a 'submit_bio bio->bi_iter.bi_sector'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)

  You can now use it in all perf tools, such as:

	perf record -e probe:submit_bio -aR sleep 1

  # trace --no-syscalls --ev probe:submit_bio
      0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0)
   3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8)
   3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0)
   3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8)
<SNIP>
   4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88)
   4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880)
  ^C[root@jouet ~]#

Now, using this new feature:

[root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s'
Added new event:
  probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)

You can now use it in all perf tools, such as:

	perf record -e probe:submit_bio -aR sleep 1

  [root@jouet ~]# trace --no-syscalls --ev probe:submit_bio
     0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704)
     0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712)
     0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720)
     2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728)
  5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0)
  5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8)
  5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16)
  5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24)
  ^C#

With callchains:

  # trace --no-syscalls --ev probe:submit_bio/max-stack=10/
     0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       journal_submit_commit_record+0xa82001ac ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
  ^C#

Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470710408-23515-1-git-send-email-naohiro.aota@hgst.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:52:22 -03:00
Steven Rostedt (Red Hat)
c87edb3611 tracing: Fix tick_stop tracepoint symbols for user export
The symbols used in the tick_stop tracepoint were not being converted
properly into integers in the trace_stop format file. Instead we had this:

print fmt: "success=%d dependency=%s", REC->success,
    __print_symbolic(REC->dependency, { 0, "NONE" },
     { (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" },
     { (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" },
     { (1 << TICK_DEP_BIT_SCHED), "SCHED" },
     { (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" })

User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other
symbols used to do the bit shifting. The reason is that the conversion was
done with using the TICK_DEP_MASK_* symbols which are just macros that
convert to the BIT shift itself (with the exception of NONE, which was
converted properly, because it doesn't use bits, and is defined as zero).

The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to
have this properly converted for user space tools to parse this event.

Cc: stable@vger.kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: e6e6cc22e0 ("nohz: Use enum code for tick stop failure tracing message")
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-09 09:51:23 -04:00
Mark Rutland
3df33eff2b perf stat: Avoid skew when reading events
When we don't have a tracee (i.e. we're attaching to a task or CPU),
counters can still be running after our workload finishes, and can still
be running as we read their values. As we read events one-by-one, there
can be arbitrary skew between values of events, even within a group.
This means that ratios within an event group are not reliable.

This skew can be seen if measuring a group of identical events, e.g:

  # perf stat -a -C0 -e '{cycles,cycles}' sleep 1

To avoid this, we must stop groups from counting before we read the
values of any constituent events. This patch adds and makes use of a new
disable_counters() helper, which disables group leaders (and thus each
group as a whole). This mirrors the use of enable_counters() for
starting event groups in the absence of a tracee.

Closing a group leader splits the group, and without a disabled group
leader the newly split events will begin counting. Thus to ensure counts
are reliable we must defer closing group leaders until all counts have
been read. To do so this patch removes the event closing logic from the
read_counters() helper, explicitly closes the events using
perf_evlist__close(), which also aids legibility.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:48:32 -03:00
Konstantin Khlebnikov
cb3f3378cd perf probe: Fix module name matching
If module is "module" then dso->short_name is "[module]".  Substring
comparing is't enough: "raid10" matches to "[raid1]".  This patch also
checks terminating zero in module name.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:48:09 -03:00
Masami Hiramatsu
8e34189b34 perf probe: Adjust map->reloc offset when finding kernel symbol from map
Adjust map->reloc offset for the unmapped address when finding
alternative symbol address from map, because KASLR can relocate the
kernel symbol address.

The same adjustment has been done when finding appropriate kernel symbol
address from map which was introduced by commit f90acac757 ("perf
probe: Find given address from offline dwarf")

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu@linaro.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160806192948.e366f3fbc4b194de600f8326@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:47:43 -03:00
Arnaldo Carvalho de Melo
887fa86d6f perf hists: Trim libtraceevent trace_seq buffers
When we use libtraceevent to format trace event fields into printable
strings to use in hist entries it is important to trim it from the
default 4 KiB it starts with to what is really used, to reduce the
memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning
the seq.buffer formatted with the fields contents.

Reported-and-Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-t3hl7uxmilrkigzmc90rlhk2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:46:56 -03:00
Brendan Gregg
bcdc09af3e perf script: Add 'bpf-output' field to usage message
This adds the 'bpf-output' field to the perf script usage message, and docs.

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470192469-11910-4-git-send-email-bgregg@netflix.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:46:43 -03:00
James Hogan
97b1d23f7b metag: Drop show_mem() from mem_init()
The recent commit 599d0c954f ("mm, vmscan: move LRU lists to node"),
changed memory management code so that show_mem() is no longer safe to
call prior to setup_per_cpu_pageset(), as pgdat->per_cpu_nodestats will
still be NULL. This causes an oops on metag due to the call to
show_mem() from mem_init():

  node_page_state_snapshot(...) + 0x48
  pgdat_reclaimable(struct pglist_data * pgdat = 0x402517a0)
  show_free_areas(unsigned int filter = 0) + 0x2cc
  show_mem(unsigned int filter = 0) + 0x18
  mem_init()
  mm_init()
  start_kernel() + 0x204

This wasn't a problem before with zone_reclaimable() as zone_pcp_init()
was already setting zone->pageset to &boot_pageset, via setup_arch() and
paging_init(), which happens before mm_init():

  zone_pcp_init(...)
  free_area_init_core(...) + 0x138
  free_area_init_node(int nid = 0, ...) + 0x1a0
  free_area_init_nodes(...) + 0x440
  paging_init(unsigned long mem_end = 0x4fe00000) + 0x378
  setup_arch(char ** cmdline_p = 0x4024e038) + 0x2b8
  start_kernel() + 0x54

No other arches appear to call show_mem() during boot, and it doesn't
really add much value to the log, so lets just drop it from mem_init().

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-metag@vger.kernel.org
2016-08-09 13:41:30 +01:00
Cornelia Huck
3b2fbb3f06 virtio/s390: deprecate old transport
There only ever have been two host implementations of the old
s390-virtio (pre-ccw) transport: the experimental kuli userspace,
and qemu. As qemu switched its default to ccw with 2.4 (with most
users having used ccw well before that) and removed the old transport
entirely in 2.6, s390-virtio probably hasn't been in active use for
quite some time and is therefore likely to bitrot.

Let's start the slow march towards removing the code by deprecating
it.

Note that this also deprecates the early virtio console code, which
has been causing trouble in the guest without being wired up in any
relevant hypervisor code.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:41 +03:00
Christian Borntraeger
2ab0d56aad virtio/s390: keep early_put_chars
In case the registration of the hvc tty never happens AND the kernel
thinks that hvc0 is the preferred console we should keep the early
printk function to avoid a kernel panic due to code being removed.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:40 +03:00
Minfei Huang
347a529398 virtio_blk: Fix a slient kernel panic
We do a lot of memory allocation in function init_vq, and don't handle
the allocation failure properly. Then this function will return 0,
although initialization fails due to lacking memory. At that moment,
kernel will panic in guest machine, if virtio is used to drive disk.

To fix this bug, we should take care of allocation failure, and return
correct value to let caller know what happen.

Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:39 +03:00