We have remap_file_pages(2) emulation in -mm tree for few release cycles
and we plan to have it mainline in v3.20. This patchset removes rest of
VM_NONLINEAR infrastructure.
Patches 1-8 take care about generic code. They are pretty
straight-forward and can be applied without other of patches.
Rest patches removes pte_file()-related stuff from architecture-specific
code. It usually frees up one bit in non-present pte. I've tried to reuse
that bit for swap offset, where I was able to figure out how to do that.
For obvious reason I cannot test all that arch-specific code and would
like to see acks from maintainers.
In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial
kernel code. That's too much for functionality nobody uses.
Tested-by: Felipe Balbi <balbi@ti.com>
This patch (of 38):
We don't create non-linear mappings anymore. Let's drop code which
handles them on unmap/zap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.
The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_COMPACTION=y, CONFIG_DEBUG_FS=n:
mm/vmstat.c:690: warning: 'frag_start' defined but not used
mm/vmstat.c:702: warning: 'frag_next' defined but not used
mm/vmstat.c:710: warning: 'frag_stop' defined but not used
mm/vmstat.c:715: warning: 'walk_zones_in_node' defined but not used
It's all a bit of a tangly mess and it's unclear why CONFIG_COMPACTION
figures in there at all. Move frag_start/frag_next/frag_stop and
migratetype_names[] into the existing CONFIG_PROC_FS block.
walk_zones_in_node() gets a special ifdef.
Also move the #include lines up to where #include lines live.
[axel.lin@ingics.com: fix build error when !CONFIG_PROC_FS]
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here, free memory is allocated using kmem_cache_zalloc. So, use
kmem_cache_free instead of kfree.
This is done using Coccinelle and semantic patch used
is as follows:
@@
expression x,E,c;
@@
x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
... when != x = E
when != &x
?-kfree(x)
+kmem_cache_free(c,x)
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
compound_head() is implemented with assumption that there would be race
condition when checking tail flag. This assumption is only true when we
try to access arbitrary positioned struct page.
The situation that virt_to_head_page() is called is different case. We
call virt_to_head_page() only in the range of allocated pages, so there
is no race condition on tail flag. In this case, we don't need to
handle race condition and we can reduce overhead slightly. This patch
implements compound_head_fast() which is similar with compound_head()
except tail flag race handling. And then, virt_to_head_page() uses this
optimized function to improve performance.
I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063
ns -> 13.810 ns) if target object is on tail page.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We had to insert a preempt enable/disable in the fastpath a while ago in
order to guarantee that tid and kmem_cache_cpu are retrieved on the same
cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can
move the process to other cpu during retrieving data.
Now, I reach the solution to remove preempt enable/disable in the
fastpath. If tid is matched with kmem_cache_cpu's tid after tid and
kmem_cache_cpu are retrieved by separate this_cpu operation, it means
that they are retrieved on the same cpu. If not matched, we just have
to retry it.
With this guarantee, preemption enable/disable isn't need at all even if
CONFIG_PREEMPT, so this patch removes it.
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in
CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
Below is the result of Christoph's slab_test reported by Jesper Dangaard
Brouer.
* Before
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 49 cycles kfree -> 62 cycles
10000 times kmalloc(16) -> 48 cycles kfree -> 64 cycles
10000 times kmalloc(32) -> 53 cycles kfree -> 70 cycles
10000 times kmalloc(64) -> 64 cycles kfree -> 77 cycles
10000 times kmalloc(128) -> 74 cycles kfree -> 84 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 114 cycles
10000 times kmalloc(512) -> 83 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 81 cycles kfree -> 120 cycles
10000 times kmalloc(2048) -> 104 cycles kfree -> 136 cycles
10000 times kmalloc(4096) -> 142 cycles kfree -> 165 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 403 cycles kfree -> 264 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 68 cycles
10000 times kmalloc(16)/kfree -> 68 cycles
10000 times kmalloc(32)/kfree -> 69 cycles
10000 times kmalloc(64)/kfree -> 68 cycles
10000 times kmalloc(128)/kfree -> 68 cycles
10000 times kmalloc(256)/kfree -> 68 cycles
10000 times kmalloc(512)/kfree -> 74 cycles
10000 times kmalloc(1024)/kfree -> 75 cycles
10000 times kmalloc(2048)/kfree -> 74 cycles
10000 times kmalloc(4096)/kfree -> 74 cycles
10000 times kmalloc(8192)/kfree -> 75 cycles
10000 times kmalloc(16384)/kfree -> 510 cycles
* After
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 46 cycles kfree -> 61 cycles
10000 times kmalloc(16) -> 46 cycles kfree -> 63 cycles
10000 times kmalloc(32) -> 49 cycles kfree -> 69 cycles
10000 times kmalloc(64) -> 57 cycles kfree -> 76 cycles
10000 times kmalloc(128) -> 66 cycles kfree -> 83 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 77 cycles kfree -> 114 cycles
10000 times kmalloc(1024) -> 80 cycles kfree -> 116 cycles
10000 times kmalloc(2048) -> 102 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 135 cycles kfree -> 163 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 218 cycles
10000 times kmalloc(16384) -> 399 cycles kfree -> 262 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 65 cycles
10000 times kmalloc(16)/kfree -> 66 cycles
10000 times kmalloc(32)/kfree -> 65 cycles
10000 times kmalloc(64)/kfree -> 66 cycles
10000 times kmalloc(128)/kfree -> 66 cycles
10000 times kmalloc(256)/kfree -> 71 cycles
10000 times kmalloc(512)/kfree -> 72 cycles
10000 times kmalloc(1024)/kfree -> 71 cycles
10000 times kmalloc(2048)/kfree -> 71 cycles
10000 times kmalloc(4096)/kfree -> 71 cycles
10000 times kmalloc(8192)/kfree -> 65 cycles
10000 times kmalloc(16384)/kfree -> 511 cycles
Most of the results are better than before.
Note that this change slightly worses performance in !CONFIG_PREEMPT,
roughly 0.3%. Implementing each case separately would help performance,
but, since it's so marginal, I didn't do that. This would help
maintanance since we have same code for all cases.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__generic_block_fiemap may spin very long time for large sparse files.
Without this patch an unprivileged user may abuse system resources simply
by spawning a vast number of unkilable busyloops (works on ext2/ext3):
truncate --size 1T test
for ((i=0;i<1024;i++))
do
filefrag test > /dev/null &
done
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A tiny race between BAST and unlock message causes the NULL dereference.
A node sends an unlock request to master and receives a response. Before
processing the response it receives a BAST from the master. Since both
requests are processed by different threads it creates a race. While the
BAST is being processed, lock can get freed by unlock code.
This patch makes bast to return immediately if lock is found but unlock is
pending. The code should handle this race. We also have to fix master
node to skip sending BAST after receiving unlock message.
Below is the crash stack
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: o2dlm_blocking_ast_wrapper+0xd/0x16
dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
worker_thread+0x14d/0x1ed
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ocfs2_dentry_convert_worker, we should prune the dcache before deleting
the dentry of directory, otherwise, in the following cases the inode of
directory will still remain in orphan directory until the device being
umounted.
Mount point: /mnt/ocfs2
Node A Node B
mkdir /mnt/ocfs2/testdir
ocfs2_mkdir
->ocfs2_mknod
->ocfs2_dentry_attach_lock
->ocfs2_dentry_lock(dentry, 0)
... ...
touch /mnt/ocfs2/testdir/testfile
unlink /mnt/test/testdir/testfile
rmdir /mnt/ocfs2/testdir
ocfs2_unlink
->ocfs2_remote_dentry_delete
->ocfs2_dentry_lock(dentry, 1)
... ...
... ...
ocfs2_downconvert_thread
->ocfs2_unblock_lock
->ocfs2_dentry_convert_worker
->ocfs2_find_local_alias
->dget_dlock
->d_delete
Here the dentry can not be
released because the children's
dentry is negative but still exist.
Finally, this inode will still remain
in orphan directory until its children
are destroyed.
So before deleting dentry of directory, we should prune the dcache to
remove unused children of the parent dentry by shrink_dcache_parent().
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
resv_lock is only used in reservations.c
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mlog_errno() is called twice when some functions are failed.
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Smatch complains that, if o2net_tx_can_proceed() returns false, then "sc"
and "ret" are uninialized or maybe we are re-using the data from previous
iteration. I do not know if we can hit this bug in real life but checking
the return value is harmless and we may as well silence the static checker
warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The assigned value is never used.
Coverity-id 1226847.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a mount option to support JBD2 feature:
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance. This option will enable
'journal_checksum' internally.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.
test script:
fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000
default:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 215.2 17878
with journal_async_commit option:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 317.5 17881
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Weiwei Wang <wangww631@huawei.comm>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to ocfs2_write_end_nolock() which is metioned at commit
136f49b917 ("ocfs2: fix journal commit deadlock"), we should unlock
pages before ocfs2_commit_trans() in ocfs2_convert_inline_data_to_extents.
Otherwise, it will cause a deadlock with journal commit threads.
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove dlm_joined() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove ol_dqblk_file_block() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove ocfs2_xattr_bucket_get_val() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use snprintf format specifier "%lu" instead of "%ld" for argument of type
'unsigned long'.
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
O2NET_CONN_IDLE_DELAY is not defined, connection attempts will not be
canceled due to timeout.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Variable "why" is not yet initialized at line 615, fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
else is unnecessary after return.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the recovery master is down, the owner of $RECOVERY calls
dlm_do_local_recovery_cleanup() to prune any $RECOVERY entries for dead
nodes. The lock is in the granted list and the refcount must be 2. We
should put twice to remove this lock. Otherwise, it will lead to a memory
leak.
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: yangwenfang <vicky.yangwenfang@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Defining these macros way down in arch/sh/.../irq.c doesn't cause
kernel/irq/generic-chip.c to use them. As far as I can tell this code has
no effect.
Fixes: 332fd7c4fe ("genirq: Generic chip: Change irq_reg_{readl,writel} arguments")
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (cpp/asm comparison)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What sh4 actually wants is HAVE_PATA_PLATFORM, so select that instead.
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e9fd702a58 ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit. Audit code
wants to update inode number of an inode corresponding to watched name
in a directory. When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened. That confuses audit and it starts watching parent directory
instead of a file in a directory.
This can be observed for example by doing:
cd /tmp
touch foo bar
auditctl -w /tmp/foo
touch foo
mv bar foo
touch foo
In audit log we see events like:
type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
...
type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
...
and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch. However we start seeing events for unrelated stuff
happening in /tmp.
Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:
fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
kernel/audit_tree.c doesn't care about passed 'data' at all.
kernel/audit_watch.c expects moved inode as 'data'.
Fixes: e9fd702a58 ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inotify interface has changed a lot. The user interface was too
old, and the kernel interface was removed by Eric Paris in commit:
2dfc1ca inotify: remove inotify in kernel interface.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Kai <morgan.wang@huawei.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Robert Love <robert.w.love@intel.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently FAN_ONDIR is always set on a mark's ignored mask when the
event mask is extended without FAN_MARK_ONDIR being set. This may
result in events for directories being ignored unexpectedly for call
sequences like
fanotify_mark(fd, FAN_MARK_ADD, FAN_OPEN | FAN_ONDIR , AT_FDCWD, "dir");
fanotify_mark(fd, FAN_MARK_ADD, FAN_CLOSE, AT_FDCWD, "dir");
Also FAN_MARK_ONDIR is only honored when adding events to a mark's mask,
but not for event removal. Fix both issues by not setting FAN_ONDIR
implicitly on the ignore mask any more. Instead treat FAN_ONDIR as any
other event flag and require FAN_MARK_ONDIR to be set by the user for
both event mask and ignore mask. Furthermore take FAN_MARK_ONDIR into
account when set for event removal.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If removing bits from a mark's ignored mask, the concerning
inodes/vfsmounts mask is not affected. So don't recalculate it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In fanotify_mark_remove_from_mask() a mark is destroyed if only one of
both bitmasks (mask or ignored_mask) of a mark is cleared. However the
other mask may still be set and contain information that should not be
lost. So only destroy a mark if both masks are cleared.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit 944d9fec8d ("hugetlb: add support for gigantic page
allocation at runtime") we can allocate 1G pages at runtime if CMA is
enabled.
Let's register 1G pages into hugetlb even if the user hasn't requested
them explicitly at boot time with hugepagesz=1G.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Reworked handling for foreign (grant mapped) pages to simplify the
code, enable a number of additional use cases and fix a number of
long-standing bugs.
- Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
stable).
- Assorted other cleanup and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJU2JC+AAoJEFxbo/MsZsTRIvAH/1lgQ0EQlxaZtEFWY8cJBzxY
dXaTMfyGQOddGYDCW0r42hhXJHeX7DWXSERSD3aW9DZOn/eYdneHq9gWRD4uPrGn
hEFQ26J4jZWR5riGXaja0LqI2gJKLZ6BhHIQciLEbY+jw4ynkNBLNRPFehuwrCsZ
WdBwJkyvXC3RErekncRl/aNhxdi4p1P6qeiaW/mo3UcSO/CFSKybOLwT65iePazg
XuY9UiTn2+qcRkm/tjx8K9heHK8SBEGNWuoTcWYF1to8mwwUfKIAc4NO2UBDXJI+
rp7Z2lVFdII15JsQ08ATh3t7xDrMWLzCX/y4jCzmF3DBXLbSWdHCQMgI7TWt5pE=
=PyJK
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen features and fixes from David Vrabel:
- Reworked handling for foreign (grant mapped) pages to simplify the
code, enable a number of additional use cases and fix a number of
long-standing bugs.
- Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
stable).
- Assorted other cleanup and minor bug fixes.
* tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits)
xen/manage: Fix USB interaction issues when resuming
xenbus: Add proper handling of XS_ERROR from Xenbus for transactions.
xen/gntdev: provide find_special_page VMA operation
xen/gntdev: mark userspace PTEs as special on x86 PV guests
xen-blkback: safely unmap grants in case they are still in use
xen/gntdev: safely unmap grants in case they are still in use
xen/gntdev: convert priv->lock to a mutex
xen/grant-table: add a mechanism to safely unmap pages that are in use
xen-netback: use foreign page information from the pages themselves
xen: mark grant mapped pages as foreign
xen/grant-table: add helpers for allocating pages
x86/xen: require ballooned pages for grant maps
xen: remove scratch frames for ballooned pages and m2p override
xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
mm: add 'foreign' alias for the 'pinned' page flag
mm: provide a find_special_page vma operation
x86/xen: cleanup arch/x86/xen/mmu.c
x86/xen: add some __init annotations in arch/x86/xen/mmu.c
x86/xen: add some __init and static annotations in arch/x86/xen/setup.c
x86/xen: use correct types for addresses in arch/x86/xen/setup.c
...
- Remove various compilation errors
- Various code cleanup patches
- Add missing MB versions/architectures for autodetection
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEABECAAYFAlTaI74ACgkQykllyylKDCF6yACdHSOJVnxwov4QLHfd6HVKEsoc
Cc4An1BC7uz51bFJA4lzsafogTWRHieW
=32mk
-----END PGP SIGNATURE-----
Merge tag 'microblaze-3.20-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze pupdates from Michal Simek:
- Remove various compilation errors
- Various code cleanup patches
- Add missing MB versions/architectures for autodetection
* tag 'microblaze-3.20-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Remove *.dtb files in make clean
microblaze: whitespace fix
microblaze/uaccess: fix sparse errors
microblaze: intc: Reformat output
microblaze: intc: Refactor DT sanity check
microblaze: intc: Don't override error codes
microblaze: Add target architecture
microblaze: Add missing PVR version codes
microblaze: Fix variable types to remove W=1 warning
microblaze: Use unsigned type for limit comparison in cache.c
microblaze: Use unsigned type for proper comparison in cpuinfo*.c
microblaze: Use unsigned type for "for" loop because of comparison-kgdb.c
microblaze: Change extern inline to static inline
microblaze: Mark get_frame_size as static
microblaze: Use unsigned return type in do_syscall_trace_enter
microblaze: Declare microblaze_kgdb_break in header
microblaze: Remove unused prom header from reset.c
microblaze: Remove unused prom_parse.c
microblaze: Wire-up execveat syscall
microblaze: Use empty asm-generic/linkage.h
xs_tcp_close() is now just a call to xs_tcp_shutdown(), so remove it,
and replace the entry in xs_tcp_ops.
Suggested-by: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If a task uses vector registers, a save area is allocated to save/restore
register states. Free the save area when releasing the task.
Found the Memory leak with kmemleak:
unreferenced object 0x72885e00 (size 512):
comm "vx-test", pid 26123, jiffies 4294945635 (age 256.810s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 01 db 71 06 41 .............q.A
00 00 00 00 00 00 00 00 24 f7 a9 a7 51 94 79 bb ........$...Q.y.
backtrace:
[<00000000002d1c8a>] kmem_cache_alloc_trace+0x272/0x3d0
[<00000000001014ac>] alloc_vector_registers+0x54/0x138
[<00000000001017c8>] data_exception+0x158/0x1b0
[<00000000008b551e>] pgm_check_handler+0x13e/0x180
[<00000000800008b6>] 0x800008b6
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the binary hypfs interfaces provides new data only once within
an interval time of one second. This patch removes this restriction and
now new data is returned immediately on every read on a hypfs binary file.
This is done in order to allow more consistent snapshots for programs
that read multiple hypfs binary files.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With this feature, you can read the CPU performance metrics provided by the
z/VM diagnose 0C. This then allows to get the management time for each
online CPU of the guest where the diagnose is executed.
The new debugfs file /sys/kernel/debug/s390_hypfs/diag_0c exports the
diag0C binary data to user space via an open/read/close interface.
The binary data consists out of a header structure followed by an
array that contains the diagnose 0c data for each online CPU.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
dts/Makefile is called only for simpleImage target
which is causing that *.dtb are not removed.
This patch fix it.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
* pm-tools:
tools/power turbostat: relax dependency on APERF_MSR
tools/power turbostat: relax dependency on invariant TSC
tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
tools/power turbostat: relax dependency on root permission
cpupower Makefile change to help run the tool without 'make install'
* pm-cpufreq: (46 commits)
intel_pstate: provide option to only use intel_pstate with HWP
cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation
cpufreq: Create for_each_governor()
cpufreq: Create for_each_policy()
cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}()
cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
intel_pstate: honor user space min_perf_pct override on resume
intel_pstate: respect cpufreq policy request
intel_pstate: Add num_pstates to sysfs
intel_pstate: expose turbo range to sysfs
intel_pstate: Add support for SkyLake
cpufreq: stats: drop unnecessary locking
cpufreq: stats: don't update stats on false notifiers
cpufreq: stats: don't update stats from show_trans_table()
cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update()
cpufreq: stats: create sysfs group once we are ready
cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications
cpufreq: stats: drop 'cpu' field of struct cpufreq_stats
cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy
cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats'
...
* pm-domains:
PM: Convert dev_pm_put_subsys_data() into a void function
PM: Update function header for dev_pm_get_subsys_data()
PM / Domains: Handle errors from genpd's ->attach_dev() callback
PM / Domains: Re-order initialization of generic_pm_domain_data
PM / Domains: Free pm_subsys_data in error path in __pm_genpd_add_device()
PM / Domains: Eliminate the mutex for the generic_pm_domain_data
PM / Domains: Don't check for an existing device when adding a new
PM / Domains: Don't allow an existing generic_pm_domain_data
PM / Domains: Remove reference counting for the generic_pm_domain_data
PM / Domains: Rename __pm_genpd_alloc|free_dev_data()
PM / Domains: Remove pm_genpd_dev_need_restore() API
* acpi-resources: (23 commits)
Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug
ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug
x86/PCI: Refine the way to release PCI IRQ resources
x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation
x86/PCI: Fix the range check for IO resources
PCI: Use common resource list management code instead of private implementation
resources: Move struct resource_list_entry from ACPI into resource core
ACPI: Introduce helper function acpi_dev_filter_resource_type()
ACPI: Add field offset to struct resource_list_entry
ACPI: Translate resource into master side address for bridge window resources
ACPI: Return translation offset when parsing ACPI address space resources
ACPI: Enforce stricter checks for address space descriptors
ACPI: Set flag IORESOURCE_UNSET for unassigned resources
ACPI: Normalize return value of resource parser functions
ACPI: Fix a bug in parsing ACPI Memory24 resource
ACPI: Add prefetch decoding to the address space parser
ACPI: Move the window flag logic to the combined parser
ACPI: Unify the parsing of address_space and ext_address_space
ACPI: Let the parser return false for disabled resources
...