linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 00:29:10 +07:00

Author	SHA1	Message	Date
Michael Kerrisk (man-pages)	09b4d19900	pipe: simplify logic in alloc_pipe_info() Replace an 'if' block that covers most of the code in this function with a 'goto'. This makes the code a little simpler to read, and also simplifies the next patch (fix limit checking in alloc_pipe_info()) Link: http://lkml.kernel.org/r/aef030c1-0257-98a9-4988-186efa48530c@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:32 -07:00
Michael Kerrisk (man-pages)	b0b91d18e2	pipe: fix limit checking in pipe_set_size() The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ)) has the following problems: (1) When increasing the pipe capacity, the checks against the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the increased pipe capacity. The new increase in pipe capacity can then push the total memory used by the user for pipes (possibly far) over a limit. This can also trigger the problem described next. (2) The limit checks are performed even when the new pipe capacity is less than the existing pipe capacity. This can lead to problems if a user sets a large pipe capacity, and then the limits are lowered, with the result that the user will no longer be able to decrease the pipe capacity. (3) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch addresses the above problems as follows: * Perform checks against the limits only when increasing a pipe's capacity; an unprivileged user can always decrease a pipe's capacity. * Alter the checks against limits to include the memory required for the new pipe capacity. * Re-order the accounting step so that it precedes the buffer allocation. If the accounting step determines that a limit has been reached, revert the accounting and cause the operation to fail. The program below can be used to demonstrate problems 1 and 2, and the effect of the fix. The program takes one or more command-line arguments. The first argument specifies the number of pipes that the program should create. The remaining arguments are, alternately, pipe capacities that should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between the fcntl() operations. (The sleep intervals allow the possibility to change the limits between fcntl() operations.) Problem 1 ========= Using the test program on an unpatched kernel, we first set some limits: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 1000000000 > /proc/sys/fs/pipe-max-size # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB Then show that we can set a pipe with capacity (100MB) that is over the hard limit # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 100000000 bytes F_SETPIPE_SZ returned 134217728 Now set the capacity to 100MB twice. The second call fails (which is probably surprising to most users, since it seems like a no-op): # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 0 100000000 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 100000000 bytes F_SETPIPE_SZ returned 134217728 Loop 2: set pipe capacity to 100000000 bytes Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted With a patched kernel, setting a capacity over the limit fails at the first attempt: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 1000000000 > /proc/sys/fs/pipe-max-size # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # sudo -u mtk ./test_F_SETPIPE_SZ 1 100000000 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 100000000 bytes Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted There is a small chance that the change to fix this problem could break user-space, since there are cases where fcntl(F_SETPIPE_SZ) calls that previously succeeded might fail. However, the chances are small, since (a) the pipe-user-pages-{soft,hard} limits are new (in 4.5), and the default soft/hard limits are high/unlimited. Therefore, it seems warranted to make these limits operate more precisely (and behave more like what users probably expect). Problem 2 ========= Running the test program on an unpatched kernel, we first set some limits: # getconf PAGESIZE 4096 # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 1000000000 > /proc/sys/fs/pipe-max-size # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, first setting a pipe capacity (10MB), sleeping for a few seconds, during which time the hard limit is lowered, and then set pipe capacity to a smaller amount (5MB): # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 & [1] 748 # Initial pipe capacity: 65536 Loop 1: set pipe capacity to 10000000 bytes F_SETPIPE_SZ returned 16777216 Sleeping 15 seconds # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB # Loop 2: set pipe capacity to 5000000 bytes Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted In this case, the user should be able to lower the limit. With a kernel that has the patch below, the second fcntl() succeeds: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 1000000000 > /proc/sys/fs/pipe-max-size # echo 10000 > /proc/sys/fs/pipe-user-pages-hard # sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 & [1] 3215 # Initial pipe capacity: 65536 # Loop 1: set pipe capacity to 10000000 bytes F_SETPIPE_SZ returned 16777216 Sleeping 15 seconds # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # Loop 2: set pipe capacity to 5000000 bytes F_SETPIPE_SZ returned 8388608 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- /* test_F_SETPIPE_SZ.c (C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity and interactions with limits defined by /proc/sys/fs/pipe-* files. / #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char argv[]) { int (pfd)[2]; int npipes; int pcap, rcap; int j, p, s, stime, loop; if (argc < 2) { fprintf(stderr, "Usage: %s num-pipes " "[pipe-capacity sleep-time]...\n", argv[0]); exit(EXIT_FAILURE); } npipes = atoi(argv[1]); pfd = calloc(npipes, sizeof (int [2])); if (pfd == NULL) { perror("calloc"); exit(EXIT_FAILURE); } for (j = 0; j < npipes; j++) { if (pipe(pfd[j]) == -1) { fprintf(stderr, "Loop %d: pipe() failed: ", j); perror("pipe"); exit(EXIT_FAILURE); } } printf("Initial pipe capacity: %d\n", fcntl(pfd[0][0], F_GETPIPE_SZ)); for (j = 2; j < argc; j += 2 ) { loop = j / 2; pcap = atoi(argv[j]); printf(" Loop %d: set pipe capacity to %d bytes\n", loop, pcap); for (p = 0; p < npipes; p++) { s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap); if (s == -1) { fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ " "failed: ", loop, p); perror("fcntl"); exit(EXIT_FAILURE); } if (p == 0) { printf(" F_SETPIPE_SZ returned %d\n", s); rcap = s; } else { if (s != rcap) { fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ " "unexpected return: %d\n", loop, p, s); exit(EXIT_FAILURE); } } stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0; if (stime > 0) { printf(" Sleeping %d seconds\n", stime); sleep(stime); } } } exit(EXIT_SUCCESS); } 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- Patch history: v2 Switch order of test in 'if' statement to avoid function call (to capability()) in normal path. [This is a fix to a preexisting wart in the code. Thanks to Willy Tarreau] * Perform (size > pipe_max_size) check before calling account_pipe_buffers(). [Thanks to Vegard Nossum] Quoting Vegard: The potential problem happens if the user passes a very large number which will overflow pipe->user->pipe_bufs. On 32-bit, sizeof(int) == sizeof(long), so if they pass arg = INT_MAX then round_pipe_size() returns INT_MAX. Although it's true that the accounting is done in terms of pages and not bytes, so you'd need on the order of (1 << 13) = 8192 processes hitting the limit at the same time in order to make it overflow, which seems a bit unlikely. (See https://lkml.org/lkml/2016/8/12/215 for another discussion on the limit checking) Link: http://lkml.kernel.org/r/1e464945-536b-2420-798b-e77b9c7e8593@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:32 -07:00
Michael Kerrisk (man-pages)	3734a13b96	pipe: refactor argument for account_pipe_buffers() This is a preparatory patch for following work. account_pipe_buffers() performs accounting in the 'user_struct'. There is no need to pass a pointer to a 'pipe_inode_info' struct (which is then dereferenced to obtain a pointer to the 'user' field). Instead, pass a pointer directly to the 'user_struct'. This change is needed in preparation for a subsequent patch that the fixes the limit checking in alloc_pipe_info() (and the resulting code is a little more logical). Link: http://lkml.kernel.org/r/7277bf8c-a6fc-4a7d-659c-f5b145c981ab@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Michael Kerrisk (man-pages)	d37d416664	pipe: move limit checking logic into pipe_set_size() This is a preparatory patch for following work. Move the F_SETPIPE_SZ limit-checking logic from pipe_fcntl() into pipe_set_size(). This simplifies the code a little, and allows for reworking required in a later patch that fixes the limit checking in pipe_set_size() Link: http://lkml.kernel.org/r/3701b2c5-2c52-2c3e-226d-29b9deb29b50@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Michael Kerrisk (man-pages)	f491bd7111	pipe: relocate round_pipe_size() above pipe_set_size() Patch series "pipe: fix limit handling", v2. When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various limits defined by /proc/sys/fs/pipe-* files are checked to see if unprivileged users are exceeding limits on memory consumption. While documenting and testing the operation of these limits I noticed that, as currently implemented, these checks have a number of problems: (1) When increasing the pipe capacity, the checks against the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the increased pipe capacity. The new increase in pipe capacity can then push the total memory used by the user for pipes (possibly far) over a limit. This can also trigger the problem described next. (2) The limit checks are performed even when the new pipe capacity is less than the existing pipe capacity. This can lead to problems if a user sets a large pipe capacity, and then the limits are lowered, with the result that the user will no longer be able to decrease the pipe capacity. (3) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch series addresses these three problems. This patch (of 8): This is a minor preparatory patch. After subsequent patches, round_pipe_size() will be called from pipe_set_size(), so place round_pipe_size() above pipe_set_size(). Link: http://lkml.kernel.org/r/91a91fdb-a959-ba7f-b551-b62477cc98a1@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	fcc24534b0	autofs: refactor ioctl fn vector in iookup_dev_ioctl() cmd part of this struct is the same as an index of itself within _ioctls[]. In fact this cmd is unused, so we can drop this part. Link: http://lkml.kernel.org/r/20160831033414.9910.66697.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	962ca7cfbd	autofs: remove possibly misleading /* #define DEBUG */ Having this in autofs_i.h gives illusion that uncommenting this enables pr_debug(), but it doesn't enable all the pr_debug() in autofs because inclusion order matters. XFS has the same DEBUG macro in its core header fs/xfs/xfs.h, however XFS seems to have a rule to include this prior to other XFS headers as well as kernel headers. This is not the case with autofs, and DEBUG could be enabled via Makefile, so autofs should just get rid of this comment to make the code less confusing. It's a comment, so there is literally no functional difference. Link: http://lkml.kernel.org/r/20160831033409.9910.77067.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Ian Kent	9b88ee0f3b	autofs4: move linux/auto_dev-ioctl.h to uapi/linux Since linux/auto_dev-ioctl.h wasn't included in include/linux/Kbuild it wasn't moved to uapi/linux as part of the uapi series. Link: http://lkml.kernel.org/r/20160812024901.12352.10984.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	f58b3c91f6	autofs: move inclusion of linux/limits.h to uapi linux/limits.h should be included by uapi instead of linux/auto_fs.h so as not to cause compile error in userspace. # cat << EOF > ./test1.c > #include <stdio.h> > #include <linux/auto_fs.h> > int main(void) { > return 0; > } > EOF # gcc -Wall -g ./test1.c In file included from ./test1.c:2:0: /usr/include/linux/auto_fs.h:54:12: error: 'NAME_MAX' undeclared here (not in a function) char name[NAME_MAX+1]; ^ Link: http://lkml.kernel.org/r/20160812024856.12352.24092.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	390855547c	autofs: fix print format for ioctl warning message All other warnings use "cmd(0x%08x)" and this is the only one with "cmd(%d)". (below comes from my userspace debug program, but not automount daemon) [ 1139.905676] autofs4:pid:1640:check_dev_ioctl_version: ioctl control interface version mismatch: kernel(1.0), user(0.0), cmd(-1072131215) Link: http://lkml.kernel.org/r/20160812024851.12352.75458.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Ian Kent	d9e1923207	autofs: add autofs_dev_ioctl_version() for AUTOFS_DEV_IOCTL_VERSION_CMD No functional changes, based on the following justification. 1. Make the code more consistent using the ioctl vector _ioctls[], rather than assigning NULL only for this ioctl command. 2. Remove goto done; for better maintainability in the long run. 3. The existing code is based on the fact that validate_dev_ioctl() sets ioctl version for any command, but AUTOFS_DEV_IOCTL_VERSION_CMD should explicitly set it regardless of the default behavior. Link: http://lkml.kernel.org/r/20160812024846.12352.9885.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Ian Kent	aa8419367b	autofs: fix dev ioctl number range check The count of miscellaneous device ioctls in fs/autofs4/autofs_i.h is wrong. The number of ioctls is the difference between AUTOFS_DEV_IOCTL_VERSION_CMD and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD (14) not the difference between AUTOFS_IOC_COUNT and 11 (21). [kusumi.tomohiro@gmail.com: fix typo that made the count macro negative] Link: http://lkml.kernel.org/r/20160831033420.9910.16809.stgit@pluto.themaw.net Link: http://lkml.kernel.org/r/20160812024841.12352.11975.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	b6e3795a06	autofs: fix pr_debug() message This isn't a return value, so change the message to indicate the status is the result of may_umount(). (or locate pr_debug() after put_user() with the same message) Link: http://lkml.kernel.org/r/20160812024836.12352.74628.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	bf72eda5f9	autofs: update struct autofs_dev_ioctl in Documentation Sync with changes made by commit `730c9eeca9` ("autofs4: improve parameter usage") which introduced an union for various ioctl commands instead of having statically named arg1,2. This commit simply replaces arg1,2 with the corresponding fields without changing semantics. Link: http://lkml.kernel.org/r/20160812024831.12352.24667.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	d873284103	autofs: fix Documentation regarding devid on ioctl The explanation on how ioctl handles devid seems incorrect. Userspace who calls this ioctl has no input regarding devid, and ioctl implementation retrieves devid via superblock. Link: http://lkml.kernel.org/r/20160812024825.12352.13486.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	72063e01ed	autofs: remove AUTOFS_DEVID_LEN This macro was never used by neither kernel nor userspace, and also doesn't represent "devid length" in bytes. (unless it was added to mean something else). Link: http://lkml.kernel.org/r/20160812024820.12352.21210.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	41a4497a4f	autofs: don't fail to free_dev_ioctl(param) Returning -ENOTTY here fails to free dynamically allocated param. Link: http://lkml.kernel.org/r/20160812024815.12352.69153.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	eea618e6d5	autofs: remove obsolete sb fields These two were left from commit `aa55ddf340` ("autofs4: remove unused ioctls") which removed unused ioctls. Link: http://lkml.kernel.org/r/20160812024810.12352.96377.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	ca552599bf	autofs: use autofs4_free_ino() to kfree dentry data kfree dentry data allocated by autofs4_new_ino() with autofs4_free_ino() instead of raw kfree. (since we have the interface to free autofs_info*) This patch was modified to remove the need to set the dentry info field to NULL dew to a change in the previous patch. Link: http://lkml.kernel.org/r/20160812024805.12352.43650.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Ian Kent	1574fa7beb	autofs: remove ino free in autofs4_dir_symlink() The inode allocation failure case in autofs4_dir_symlink() frees the autofs dentry info of the dentry without setting ->d_fsdata to NULL. That could lead to a double free so just get rid of the free and leave it to ->d_release(). Link: http://lkml.kernel.org/r/20160812024759.12352.10653.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	97537b35b6	autofs: add WARN_ON(1) for non dir/link inode case It's invalid if the given mode is neither dir nor link, so warn on else case. Link: http://lkml.kernel.org/r/20160812024754.12352.8536.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Ian Kent	1973a12269	autofs: fix autofs4_fill_super() error exit handling Somewhere along the line the error handling gotos have become incorrect. Link: http://lkml.kernel.org/r/20160812024749.12352.15100.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	749800ef53	autofs: test autofs versions first on sb initialization This patch does what the below comment says. It could be and it's considered better to do this first before various functions get called during initialization. /* Couldn't this be tested earlier? */ Link: http://lkml.kernel.org/r/20160812024744.12352.43075.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	4a44c1859f	autofs: drop unnecessary extern in autofs_i.h autofs4_kill_sb() doesn't need to be declared as extern, and no other functions in .h are explicitly declared as extern. Link: http://lkml.kernel.org/r/20160812024739.12352.99354.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Tomohiro Kusumi	e662145f5c	autofs: fix typos in Documentation/filesystems/autofs4.txt plus minor whitespace fixes. Link: http://lkml.kernel.org/r/20160812024734.12352.17122.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Christoph Hellwig	bfd45be0b8	kprobes: include <asm/sections.h> instead of <asm-generic/sections.h> asm-generic headers are generic implementations for architecture specific code and should not be included by common code. Thus use the asm/ version of sections.h to get at the linker sections. Link: http://lkml.kernel.org/r/1473602302-6208-1-git-send-email-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Joe Perches	459cf0ae5d	checkpatch: improve the octal permissions tests The function calls with octal permissions commonly span multiple lines. The current test is line oriented and fails to find some matches. Make the test use the $stat variable instead of the $line variable to span multiple lines. Also add a few functions to the known functions with permissions list. Move the SYMBOLIC_PERMS test to a separate section to find all the S_<FOO> permissions in any form not just those that have specific function names. This can now find and fix permissions uses like: .mode = S_<FOO> \| S_<BAR>; Link: http://lkml.kernel.org/r/b51bab60530912aae4ac420119d465c5b206f19f.1475030406.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Ramiro Oliveira <roliveir@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Joe Perches	ca0d8929e7	checkpatch: add warning for unnamed function definition arguments Function definitions without identifiers like int foo(int) are not preferred. Emit a warning when they occur. Link: http://lkml.kernel.org/r/94fe6378504745991b650f48fc92bb4648f25706.1474925354.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Joe Perches	5207649b7b	checkpatch: improve MACRO_ARG_PRECEDENCE test It is possible for a multiple line macro definition to have a false positive report when an argument is used on a line after a continuation \. This line might have a leading '+' as the initial character that could be confused by checkpatch as an operator. Avoid the leading character on multiple line macro definitions. Link: http://lkml.kernel.org/r/60229d13399f9b6509db5a32e30d4c16951a60cd.1473836073.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:31 -07:00
Joe Perches	9192d41a3f	checkpatch: add --strict test for precedence challenged macro arguments Add a test for macro arguents that have a non-comma leading or trailing operator where the argument isn't parenthesized to avoid possible precedence issues. Link: http://lkml.kernel.org/r/47715508972f8d786f435e583ff881dbeee3a114.1473745855.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	f59b64bffe	checkpatch: add --strict test for macro argument reuse If a macro argument is used multiple times in the macro definition, the macro argument may have an unexpected side-effect. Add a test (MACRO_ARG_REUSE) for that condition which is only emitted with command-line option --strict. Link: http://lkml.kernel.org/r/b6d67a87cafcafd15499e91780dc63b15dec0aa0.1473744906.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	af207524a4	checkpatch: improve the block comment * alignment test An "uninitialized value" is emitted when a block comment starts on the same line as a statement. Fix this and make the test use a little fewer cpu cycles too. Link: http://lkml.kernel.org/r/3c9993320c2182d37f53ac540878cfef59c3f62d.1473365956.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Charlemagne Lasse <charlemagnelasse@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	0616efa45a	checkpatch: speed up checking for filenames in sections marked obsolete Adding -f to the get_maintainer.pl invocation means git isn't invoked by get_maintainer.pl for known filenames. This reduces the overall time to run checkpatch. Link: http://lkml.kernel.org/r/22991e3a295aeb399b43af0478b6e5809106ccee.1472684066.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	15c03cfeab	const_structs.checkpatch: add frequently used from Julia Lawall's list Using const is generally a good idea. Julia Lawall has created a list of always const and almost always const structs in the kernel sources. Link: https://lkml.org/lkml/2016/8/28/95 Add the most frequently used (> 50 cases) that are almost always or always const. Link: http://lkml.kernel.org/r/1e16020f8027654db0095bbfbcc11da51025365c.1472664220.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	bf1fa1dae6	checkpatch: externalize the structs that should be const Make it easier to add new structs that should be const. Link: http://lkml.kernel.org/r/e5a8da43e7c11525bafbda1ca69a8323614dd942.1472664220.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	f333195d41	checkpatch: don't test for prefer ether_addr_<foo> < sigh > Comment these tests out. These are just too enticing to people that don't verify that both source and dest addresses really must be __aligned(2). It helps make Dan Carpenter happy too. Link: http://lkml.kernel.org/r/dc32ec66d24647f4cdf824c8dfbbc59aa7ce7b7d.1472665676.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg <gvrose8192@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	08eb9b8016	checkpatch: test multiple line block comment alignment Warn when block comments are not aligned on the * /* * block comment, no warning / / * block comment, emit warning */ Link: http://lkml.kernel.org/r/edb57bd330adfe024b95ec2a807d4aa7f0c8b112.1472261299.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	f90774e1fd	checkpatch: look for symbolic permissions and suggest octal instead S_<FOO> uses should be avoided where octal is more intelligible. Linus didst say: : It's much easier to parse and understand the octal numbers, while the : symbolic macro names are just random line noise and hard as hell to : understand. You really have to think about it. : : So we should rather go the other way: convert existing bad symbolic : permission bit macro use to just use the octal numbers. : : The symbolic names are good for the other bits (ie sticky bit, and the : inode mode _type_ numbers etc), but for the permission bits, the symbolic : names are just insane crap. Nobody sane should ever use them. Not in the : kernel, not in user space. (http://lkml.kernel.org/r/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com) Link: http://lkml.kernel.org/r/7232ef011d05a92f4caa86a5e9830d87966a2eaf.1470180926.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Joe Perches	85b0ee18bb	checkpatch: see if modified files are marked obsolete in MAINTAINERS Use get_maintainer to check the status of individual files. If "obsolete", suggest leaving the files alone. Link: http://lkml.kernel.org/r/7ceaa510dc9d2df05ec4b456baed7bb1415550b3.1471889575.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: SF Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Noam Camus	2d13e6ca42	lib/bitmap.c: enhance bitmap syntax Today there are platforms with many CPUs (up to 4K). Trying to boot only part of the CPUs may result in too long string. For example lets take NPS platform that is part of arch/arc. This platform have SMP system with 256 cores each with 16 HW threads (SMT machine) where HW thread appears as CPU to the kernel. In this example there is total of 4K CPUs. When one tries to boot only part of the HW threads from each core the string representing the map may be long... For example if for sake of performance we decided to boot only first half of HW threads of each core the map will look like: 0-7,16-23,32-39,...,4080-4087 This patch introduce new syntax to accommodate with such use case. I added an optional postfix to a range of CPUs which will choose according to given modulo the desired range of reminders i.e.: <cpus range>:sed_size/group_size For example, above map can be described in new syntax like this: 0-4095:8/16 Note that this patch is backward compatible with current syntax. [akpm@linux-foundation.org: rework documentation] Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com Signed-off-by: Noam Camus <noamca@mellanox.com> Cc: David Decotigny <decot@googlers.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Alexey Dobriyan	8cfd56d479	lib/kstrtox.c: smaller _parse_integer() Set "overflow" bit upon encountering it instead of postponing to the end of the conversion. Somehow gcc unwedges itself and generates better code: $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux _parse_integer 177 139 -38 Inspired by patch from Zhaoxiu Zeng. Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Alexey Dobriyan	1204c77f9b	include/linux/ctype.h: make isdigit() table lookupless Make isdigit into a simple range checking inline function: return '0' <= c && c <= '9'; This code is 1 branch, not 2 because any reasonable compiler can optimize this code into SUB+CMP, so the code while (isdigit((c = *s++))) ... remains 1 branch per iteration HOWEVER it suddenly doesn't do table lookup priming cacheline nobody cares about. Link: http://lkml.kernel.org/r/20160826190047.GA12536@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Mark Rutland	bf90e56e46	lib: harden strncpy_from_user The strncpy_from_user() accessor is effectively a copy_from_user() specialised to copy strings, terminating early at a NUL byte if possible. In other respects it is identical, and can be used to copy an arbitrarily large buffer from userspace into the kernel. Conceptually, it exposes a similar attack surface. As with copy_from_user(), we check the destination range when the kernel is built with KASAN, but unlike copy_from_user() we do not check the destination buffer when using HARDENED_USERCOPY. As strncpy_from_user() calls get_user() in a loop, we must call check_object_size() explicitly. This patch adds this instrumentation to strncpy_from_user(), per the same rationale as with the regular copy_from_user(). In the absence of hardened usercopy this will have no impact as the instrumentation expands to an empty static inline function. Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Ross Zwisler	e0176a2f1e	radix-tree tests: properly initialize mutex The pthread_mutex_t in regression1.c wasn't being initialized properly. Link: http://lkml.kernel.org/r/20160815194237.25967-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Ross Zwisler	eec4852543	radix-tree tests: add iteration test There are four cases I can see where we could end up with a NULL 'slot' in radix_tree_next_slot(). This unit test exercises all four of them, making sure that if in the future we have an unsafe path through radix_tree_next_slot(), we'll catch it. Here are details on the four cases: 1) radix_tree_iter_retry() via a non-tagged iteration like radix_tree_for_each_slot(). In this case we currently aren't seeing a bug because radix_tree_iter_retry() sets iter->next_index = iter->index; which means that in in the else case in radix_tree_next_slot(), 'count' is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 2) radix_tree_iter_retry() via tagged iteration like radix_tree_for_each_tagged(). This case was giving us NULL pointer dereferences in testing, and was fixed with this commit: commit `3cb9185c67` ("radix-tree: fix radix_tree_iter_retry() for tagged iterators.") This fix doesn't explicitly check for 'slot' being NULL, though, it works around the NULL pointer dereference by instead zeroing iter->tags in radix_tree_iter_retry(), which makes us bail out of the if() case in radix_tree_next_slot() before we dereference 'slot'. 3) radix_tree_iter_next() via via a non-tagged iteration like radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() and shmem_partial_swap_usage(). As with non-tagged iteration, 'count' in the else case of radix_tree_next_slot() is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 4) radix_tree_iter_next() via tagged iteration like radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). radix_tree_iter_next() zeros out iter->tags, so we end up exiting radix_tree_next_slot() here: if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; Link: http://lkml.kernel.org/r/20160815194237.25967-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Ross Zwisler	915045fe15	radix-tree: 'slot' can be NULL in radix_tree_next_slot() There are four cases I can see where we could end up with a NULL 'slot' in radix_tree_next_slot(). Yet radix_tree_next_slot() never actually checks whether 'slot' is NULL. It just happens that for the cases where 'slot' is NULL, some other combination of factors prevents us from dereferencing it. It would be very easy for someone to unwittingly change one of these factors without realizing that we are implicitly depending on it to save us from a NULL pointer dereference. Add a comment documenting the things that allow 'slot' to be safely passed as NULL to radix_tree_next_slot(). Here are details on the four cases: 1) radix_tree_iter_retry() via a non-tagged iteration like radix_tree_for_each_slot(). In this case we currently aren't seeing a bug because radix_tree_iter_retry() sets iter->next_index = iter->index; which means that in in the else case in radix_tree_next_slot(), 'count' is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 2) radix_tree_iter_retry() via tagged iteration like radix_tree_for_each_tagged(). This case was giving us NULL pointer dereferences in testing, and was fixed with this commit: commit `3cb9185c67` ("radix-tree: fix radix_tree_iter_retry() for tagged iterators.") This fix doesn't explicitly check for 'slot' being NULL, though, it works around the NULL pointer dereference by instead zeroing iter->tags in radix_tree_iter_retry(), which makes us bail out of the if() case in radix_tree_next_slot() before we dereference 'slot'. 3) radix_tree_iter_next() via via a non-tagged iteration like radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() and shmem_partial_swap_usage(). As with non-tagged iteration, 'count' in the else case of radix_tree_next_slot() is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 4) radix_tree_iter_next() via tagged iteration like radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). radix_tree_iter_next() zeros out iter->tags, so we end up exiting radix_tree_next_slot() here: if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; Link: http://lkml.kernel.org/r/20160815194237.25967-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Vlastimil Babka	2d19309cf8	fs/select: add vmalloc fallback for select(2) The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows with the number of fds passed. We had a customer report page allocation failures of order-4 for this allocation. This is a costly order, so it might easily fail, as the VM expects such allocation to have a lower-order fallback. Such trivial fallback is vmalloc(), as the memory doesn't have to be physically contiguous and the allocation is temporary for the duration of the syscall only. There were some concerns, whether this would have negative impact on the system by exposing vmalloc() to userspace. Although an excessive use of vmalloc can cause some system wide performance issues - TLB flushes etc. - a large order allocation is not for free either and an excessive reclaim/compaction can have a similar effect. Also note that the size is effectively limited by RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the bitmaps will fit well within single page and thus the vmalloc() fallback could be only excercised for processes where root allows a higher limit. Note that the poll(2) syscall seems to use a linked list of order-0 pages, so it doesn't need this kind of fallback. [eric.dumazet@gmail.com: fix failure path logic] [akpm@linux-foundation.org: use proper type for size] Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Darrick J. Wong	25f4c41415	block: implement (some of) fallocate for block devices After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Darrick J. Wong	28b2be203e	block: require write_same and discard requests align to logical block size Make sure that the offset and length arguments that we're using to construct WRITE SAME and DISCARD requests are actually aligned to the logical block size. Failure to do this causes other errors in other parts of the block layer or the SCSI layer because disks don't support partial logical block writes. Link: http://lkml.kernel.org/r/147518379026.22791.4437508871355153928.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00
Darrick J. Wong	22dd6d3566	block: invalidate the page cache when issuing BLKZEROOUT Patch series "fallocate for block devices", v11. This is a patchset to fix page cache coherency with BLKZEROOUT and implement fallocate for block devices. The first patch is a fix to the existing BLKZEROOUT ioctl to invalidate the page cache if the zeroing command to the underlying device succeeds. Without this patch we still have the pagecache coherence bug that's been in the kernel forever. The second patch changes the internal block device functions to reject attempts to discard or zeroout that are not aligned to the logical block size. Previously, we only checked that the start/len parameters were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs to 4k-LBA devices. The third patch creates an fallocate handler for block devices, wires up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent fallocate interface between files and block devices. It also allows the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing discard. Test cases for the new block device fallocate are now in xfstests as generic/349-351. This patch (of 3): Invalidate the page cache (as a regular O_DIRECT write would do) to avoid returning stale cache contents at a later time. Link: http://lkml.kernel.org/r/147518378313.22791.16649519283678515021.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:30 -07:00

... 5 6 7 8 9 ...

633266 Commits