In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.
An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.
This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.
There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.
This test detected the problem and pointed out where the issue was.
This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.
Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
On x86_32, when optimize for size is set, gcc may align the frame pointer
and make a copy of the the return address inside the stack frame.
The return address that is located in the stack frame may not be
the one used to return to the calling function. This will break the
function graph tracer.
The function graph tracer replaces the return address with a jump to a hook
function that can trace the exit of the function. If it only replaces
a copy, then the hook will not be called when the function returns.
Worse yet, when the parent function returns, the function graph tracer
will return back to the location of the child function which will
easily crash the kernel with weird results.
To see the problem, when i386 is compiled with -Os we get:
c106be03: 57 push %edi
c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi
c106be08: 83 e4 e0 and $0xffffffe0,%esp
c106be0b: ff 77 fc pushl 0xfffffffc(%edi)
c106be0e: 55 push %ebp
c106be0f: 89 e5 mov %esp,%ebp
c106be11: 57 push %edi
c106be12: 56 push %esi
c106be13: 53 push %ebx
c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp
c106be1a: e8 f5 57 fb ff call c1021614 <mcount>
When it is compiled with -O2 instead we get:
c10896f0: 55 push %ebp
c10896f1: 89 e5 mov %esp,%ebp
c10896f3: 83 ec 28 sub $0x28,%esp
c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp)
c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp)
c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp)
c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount>
The compile with -Os will align the stack pointer then set up the
frame pointer (%ebp), and it copies the return address back into
the stack frame. The change to the return address in mcount is done
to the copy and not the real place holder of the return address.
Then compile with -O2 sets up the frame pointer first, this makes
the change to the return address by mcount affect where the function
will jump on exit.
Reported-by: Jake Edge <jake@lwn.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the output of the ring buffer benchmark/test prints to
the console. This test runs for ten seconds every ten seconds and
ouputs the result after every iteration. This needlessly fills up
the logs.
This patch makes the ring buffer benchmark/test print to the ftrace
buffer using trace_printk. To view the test results, you must examine
the debug/tracing/trace file.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
will need to read from the ring buffer. But the read side of the
ring buffer still takes locks. This patch adds a check on the read
side that if it is in an NMI, then it will disable the ring buffer
and not take any locks.
Reads can still happen on a disabled ring buffer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The checking of whether the buffer is empty or not needs to be serialized
among the readers. Add the reader spin lock around it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ring buffer must have at least two pages allocated for the
reader page swap to work.
The page count check will miss the case of a zero size passed in.
Even though a zero size ring buffer would probably fail an allocation,
making the min size check for less than two instead of equal to one makes
the code a bit more robust.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The original version of the ring buffer had a hack to map the
page struct that held the pages of the buffer to also be the structure
that the ring buffer would keep the pages in a link list.
This overlap of the page struct was very dangerous and that hack was
removed a while ago.
But there was a check to make sure the buffer_page never became bigger
than the page struct, and would fail the compile if it did. The
check was only meaningful when we had the hack. Now that we have separate
allocated descriptors for the buffer pages, we can remove this check.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A check if "write > BUF_PAGE_SIZE" is done right after a
if (write > BUF_PAGE_SIZE)
return ...;
Thus the check is actually testing the compiler and not the
kernel. This is useless, remove it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The index of the event is found by masking PAGE_MASK to it and
subtracting the header size. Currently the header size is calculate
by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro
BUF_PAGE_HDR_SIZE to define it.
If we want to change BUF_PAGE_SIZE to something less than filling
the rest of the page (this is done for debugging), then we break
the algorithm to find the index.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Module unload is protected by event_mutex, while setting filter is
protected by filter_mutex. This leads to the race:
echo 'bar == 0 || bar == 10' \ |
> sample/filter |
| insmod sample.ko
add_pred("bar == 0") |
-> n_preds == 1 |
add_pred("bar == 100") |
-> n_preds == 2 |
| rmmod sample.ko
| insmod sample.ko
add_pred("&&") |
-> n_preds == 1 (should be 3) |
Now event->filter->preds is corrupted. An then when filter_match_preds()
is called, the WARN_ON() in it will be triggered.
To avoid the race, we remove filter_mutex, and replace it with event_mutex.
[ Impact: prevent corruption of filters by module removing and loading ]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ring buffer is made up of three sets of pointers.
The head page pointer, which points to the next page for the reader to
get.
The commit pointer and commit index, which points to the page and index
of the last committed write respectively.
The tail pointer and tail index, which points to the page and the index
of the last reserved data respectively (non committed).
The commit pointer is only moved forward by the outer most writer.
If a nested writer comes in, it will not move the pointer forward.
The current implementation has a flaw. It assumes that the outer most
writer successfully reserved data. There's a small race window where
the outer most writer could find the tail pointer, but a nested
writer could come in (via interrupt) and move the tail forward, and
even the commit forward.
The outer writer would not realized the commit moved forward and the
accounting will break.
This patch changes the design to use counters in the per cpu buffers
to keep track of commits. The counters are incremented at the start
of the commit, and decremented at the end. If the end commit counter
is 1, then it moves the commit pointers. A loop is made to check for
races between checking and moving the commit pointers. Only the outer
commit should move the pointers anyway.
The test of knowing if a reserve is equal to the last commit update
is still needed to know for time keeping. The time code is much less
racey than the commit updates.
This change not only solves the mentioned race, but also makes the
code simpler.
[ Impact: fix commit race and simplify code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Fix the compiler error:
kernel/trace/ring_buffer.c: In function 'rb_move_tail':
kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the addition of commit:
c7b0930857
ring-buffer: prevent adding write in discarded area
The ring buffer may now add discarded events when a write passes
the end of a buffer page. Before, a discarded event was only added
when the tracer deliberately created one. The ring buffer benchmark
test does not handle discarded events when it reads the buffer and
fails when it encounters one.
Also fix the increment for large data entries (luckily, the test did
not add any yet).
[ Impact: fix false failure of ring buffer self test ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This a very tight race where an interrupt could come in and not
have enough data to put into the end of a buffer page, and that
it would fail to write and need to go to the next page.
But if this happened when another writer was about to reserver
their data, and that writer has smaller data to reserve, then
it could succeed even though the interrupt moved the tail page.
To pervent that, if we fail to store data, and by subtracting the
amount we reserved we still have room for smaller data, we need
to fill that space with "discarded" data.
[ Impact: prevent race were buffer data may be lost ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
I forgot to update filter code accordingly in
"tracing/events: change the type of __str_loc_item to unsigned short"
(commt b0aae68cc5)
It can cause system crash:
# echo 1 > tracing/events/irq/irq_handler_entry/enable
# echo 'name == eth0' > tracing/events/irq/irq_handler_entry/filter
[ Impact: fix crash while filtering on __string() field ]
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B905.3090500@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Atomic allocation is not needed here.
[ Impact: clean up of memory alloction type ]
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B898.2050607@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
It's tracing_cpumask_new that should be kfree()ed.
This causes tracing_cpumask to be freed due to the typo:
# echo z > tracing_cpumask
bash: echo: write error: Invalid argument
And subsequent reads/writes to tracing_cpuamsk will access this
already-freed tracing_cpumask, thus may lead to crash.
[ Impact: fix leak and crash when writing invalid val to tracing_cpumask ]
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
when compiling linux-mips with kmemtrace enabled, there will be an
error:
include/linux/trace_seq.h:12: error: 'PAGE_SIZE' undeclared here (not in
a function)
I checked the source code and found trace_seq.h used PAGE_SIZE but not
included the relative header file, so, fix it via adding the header file
<asm/page.h>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Wu Zhangjin <wuzj@lemote.com>
LKML-Reference: <1244962350-28702-1-git-send-email-wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <200906122115.30787.rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
.gitignore: ignore *.lzma files
kbuild: add generic --set-str option to scripts/config
kbuild: simplify argument loop in scripts/config
kbuild: handle non-existing options in scripts/config
kallsyms: generalize text region handling
kallsyms: support kernel symbols in Blackfin on-chip memory
documentation: make version fix
kbuild: fix a compile warning
gitignore: Add GNU GLOBAL files to top .gitignore
kbuild: fix delay in setlocalversion on readonly source
README: fix misleading pointer to the defconf directory
vmlinux.lds.h update
kernel-doc: cleanup perl script
Improve vmlinux.lds.h support for arch specific linker scripts
kbuild: fix headers_exports with boolean expression
kbuild/headers_check: refine extern check
kbuild: fix "Argument list too long" error for "make headers_check",
ignore *.patch files
Remove bashisms from scripts
menu: fix embedded menu presentation
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mlx4: Add strong ordering to local inval and fast reg work requests
IB/ehca: Remove superfluous bitmasks from QP control block
RDMA/cxgb3: Limit fast register size based on T3 limitations
RDMA/cxgb3: Report correct port state and MTU
mlx4_core: Add module parameter for number of MTTs per segment
IB/mthca: Add module parameter for number of MTTs per segment
RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
infiniband: Remove void casts
IB/ehca: Increment version number
IB/ehca: Remove unnecessary memory operations for userspace queue pairs
IB/ehca: Fall back to vmalloc() for big allocations
IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
In addition to KT_DEAD which has limited support for diacriticals,
there is KT_DEAD2 that can support 256 criticals, so let's advertise
it in <linux/keyboard.h>.
This lets userland know abut the drivers/char/keyboard.c function
k_dead2, which supports more than the few trivial ones that k_dead
supports.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an option does not exist in .config, set it at the end of the file.
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (25 commits)
atmel-mci: add MCI2 register definitions
atmel-mci: Integrate AT91 specific definition in header file
tmio_mmc: allow compilation for ASIC3
mmc_block: do not DMA to stack
sdhci: Print ADMA status and pointer on debug
tmio_mmc: fix clock setup
tmio_mmc: map SD control registers after enabling the MFD cell
tmio_mmc: correct probe return value for num_resources != 3
tmio_mmc: don't use set_irq_type
tmio_mmc: add bus_shift support
MFD,mmc: tmio_mmc: make HCLK configurable
mmc_spi: don't use EINVAL for possible transmission errors
cb710: more cleanup for the DEBUG case.
sdhci: platform driver for SDHCI
mxcmmc: remove frequency workaround
cb710: handle DEBUG define in Makefile
cb710: add missing parenthesis
cb710: fix printk format string
mmc: Driver for CB710/720 memory card reader (MMC part)
pxamci: add regulator support.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: fix inverted wheel for bluetooth version of apple mighty mouse
HID: no more reinitializtion is needed in post_reset
HID: hidraw -- fix comment about accepted devices
HID: Multitouch support for the N-Trig touchscreen
HID: add new multitouch and digitizer contants
HID: autocentering support for Logitech Force 3D Pro
HID: fix hid-ff drivers so that devices work even without ff support
HID: force feedback support for SmartJoy PLUS PS2/USB adapter
HID: Wacom Graphire Bluetooth driver
HID: autocentering support for Logitech G25 Racing Wheel
The Blackfin arch has a discontiguous .text layout due to having on-chip
instruction memory and no virtual memory support. As such, we need to
add explicit checks for these additional .text regions.
Signed-off-by: Robin Getz <robin.getz@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (417 commits)
MAINTAINERS: EB110ATX is not ebsa110
MAINTAINERS: update Eric Miao's email address and status
fb: add support of LCD display controller on pxa168/910 (base layer)
[ARM] 5552/1: ep93xx get_uart_rate(): use EP93XX_SYSCON_PWRCNT and EP93XX_SYSCON_PWRCN
[ARM] pxa/sharpsl_pm: zaurus needs generic pxa suspend/resume routines
[ARM] 5544/1: Trust PrimeCell resource sizes
[ARM] pxa/sharpsl_pm: cleanup of gpio-related code.
[ARM] pxa/sharpsl_pm: drop set_irq_type calls
[ARM] pxa/sharpsl_pm: merge pxa-specific code into generic one
[ARM] pxa/sharpsl_pm: merge the two sharpsl_pm.c since it's now pxa specific
[ARM] sa1100: remove unused collie_pm.c
[ARM] pxa: fix the conflicting non-static declarations of global_gpios[]
[ARM] 5550/1: Add default configure file for w90p910 platform
[ARM] 5549/1: Add clock api for w90p910 platform.
[ARM] 5548/1: Add gpio api for w90p910 platform
[ARM] 5551/1: Add multi-function pin api for w90p910 platform.
[ARM] Make ARM_VIC_NR depend on ARM_VIC
[ARM] 5546/1: ARM PL022 SSP/SPI driver v3
ARM: OMAP4: SMP: Update defconfig for OMAP4430
ARM: OMAP4: SMP: Enable SMP support for OMAP4430
...
The Makefiles in the build directories use the internal make variable
MAKEFILE_LIST which is available from make 3.80 only. (The patch would be
valid back to 2.6.25)
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
gcc-4.4.1:
HOSTCC scripts/basic/fixdep
scripts/basic/fixdep.c: In function 'traps':
scripts/basic/fixdep.c:377: warning: dereferencing type-punned pointer will break strict-aliasing rules
scripts/basic/fixdep.c:379: warning: dereferencing type-punned pointer will break strict-aliasing rules
(Apparently -fno-strict-aliasing will fix this too)
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice. Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time. This is the same bug that was reported in ib_mthca by Yinghai
Lu <yhlu.kernel@gmail.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ignore GPATH, GRTAGS, GSYMS, and GTAGS generated by GNU GLOBAL.
Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Do not update index on read only media.
Idea published by Christian Kujau <lists@nerdbynature.de>.
Cc: Nico Schottelius <nico@ikn.schottelius.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Updated after review by Tim Abbott.
- Use HEAD_TEXT_SECTION
- Drop use of section-names.h and delete file
- Introduce EXIT_CALL
Deleting section-names.h required a few simple
updates of init.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@ksplice.com>
Without this, the default implementation is a no op which is completely
wrong with a VIVT cache, and usage of sg_copy_buffer() produces
unpredictable results.
Tested-by: Sebastian Andrzej Siewior <bigeasy@breakpoint.cc>
CC: stable@kernel.org
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>