Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fixes and various document tweaks.
One patch reaches out of the documentation subtree to fix a comment in
init/do_mounts_rd.c. There didn't seem to be anybody more appropriate to
take that one, so I accepted it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWmmJ8AAoJEI3ONVYwIuV6uqwP/0mnqdxVWo47ohaYJP7q0Soh
ovJAbfttxKnkmOdGbWcNIJtTiw+MpdF805CYR+2treE0zvEEDodg7BhkDnmKZJ9n
F1r53JrIj769E1c5ETmWTHcBt3jjtKyQIbBmDr4YTgX91dlKF28o1bMmyDECWIcT
PktTlPUidDtffKMn3klh6baPCMrTpLJ8aLshBzUrQhrQY8lxcZKAU+98vtFzYofG
LXCSulMYXumb7XBxErTLQZhmJslD4gaDMh2xkov6ALS8XNHnfoUIFRbArAllNfTf
LQGJ6Q5qnn58UWi9F/vgDqx7+d1KIPUjBxJR9wfa0w9ggQhA9ly2BSN/fllbiSbp
yIi1JS4hwBe8H/h577BNC3xjmgVN7mazZsXlS+fg3G16gpv4JdWeRY4efjosFIzQ
EIJxB8qAovUNqw4s1mzRIJ5B9L7PEK27O6z8N27Fiw4EigtMTFAOC2/GD3ELx4iJ
p1doiSr+wjfDcFd8kdIUiDKGrTSTXwNy3hUfrhzQyaEjDTJnx3+1+ono1orSazPO
Fr2RSsC5VzX4IYSuxTMvFSKjN1Iiu8xqwq3IdclHXrBhRvwOF2wpjjQ5Guf0lHBJ
FLBahSjZqt01kmwFykxoHps+VeSwpoEen6rClBQolfmtYVDTvgRNN46AxK9jZ8T4
jZmCNNs/mYzrqo/RTnmw
=u38W
-----END PGP SIGNATURE-----
Merge tag 'docs-4.5' of git://git.lwn.net/linux
Pull documentation updates from Jon Corbet:
"A relatively boring cycle in the docs tree. There's a few kernel-doc
fixes and various document tweaks.
One patch reaches out of the documentation subtree to fix a comment in
init/do_mounts_rd.c. There didn't seem to be anybody more appropriate
to take that one, so I accepted it"
* tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits)
thermal: add description for integral_cutoff unit
Documentation: update libhugetlbfs site url
Documentation: Explain pci=conf1,conf2 more verbosely
DMA-API: fix confusing sentence in Documentation/DMA-API.txt
Documentation: translations: update linux cross reference link
Documentation: fix typo in CodingStyle
init, Documentation: Remove ramdisk_blocksize mentions
Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main()
Documentation: HOWTO: update versions from 3.x to 4.x
Documentation: remove outdated references from translations
Doc: treewide: Fix grammar "a" to "an"
Documentation: cpu-hotplug: Fix sysfs mount instructions
can-doc: Add hint about getting timestamps
Fix CFQ I/O scheduler parameter name in documentation
Documentation: arm: remove dead links from Marvell Berlin docs
Documentation: HOWTO: update code cross reference link
Doc: Docbook/iio: Fix typo in iio.tmpl
DocBook: make index.html generation less verbose by default
DocBook: Cleanup: remove an unused $(call) line
DocBook: Add a help message for DOCBOOKS env var
...
Merge first patch-bomb from Andrew Morton:
- A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to
-stable
- A few misc fixes
- OCFS2 updates
- Part of MM. Including pretty large changes to page-flags handling
and to thp management which have been buffered up for 2-3 cycles now.
I have a lot of MM material this time.
[ It turns out the THP part wasn't quite ready, so that got dropped from
this series - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
zsmalloc: reorganize struct size_class to pack 4 bytes hole
mm/zbud.c: use list_last_entry() instead of list_tail_entry()
zram/zcomp: do not zero out zcomp private pages
zram: pass gfp from zcomp frontend to backend
zram: try vmalloc() after kmalloc()
zram/zcomp: use GFP_NOIO to allocate streams
mm: add tracepoint for scanning pages
drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
mm/page_isolation: use macro to judge the alignment
mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
mm: rework virtual memory accounting
include/linux/memblock.h: fix ordering of 'flags' argument in comments
mm: move lru_to_page to mm_inline.h
Documentation/filesystems: describe the shared memory usage/accounting
memory-hotplug: don't BUG() in register_memory_resource()
hugetlb: make mm and fs code explicitly non-modular
mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
mm: make sure isolate_lru_page() is never called for tail page
vmstat: make vmstat_updater deferrable again and shut down on idle
...
Pull vfs fix from Al Viro:
"Don't put symlink bodies in pagecache into highmem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Make sure that highmem pages are not added to symlink page cache
The Shared Memory accounting support is present in Kernel since commit
4b02108ac1 ("mm: oom analysis: add shmem vmstat") and in userland
free(1) since 2014. This patch updates the Documentation to reflect
this change.
Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several shortcomings with the accounting of shared memory
(SysV shm, shared anonymous mapping, mapping of a tmpfs file). The
values in /proc/<pid>/status and <...>/statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even though
theirs implication on memory usage are quite different: during reclaim,
file mapping can be dropped or written back on disk, while shmem needs a
place in swap.
Also, to distinguish the memory occupied by anonymous and file mappings,
one has to read the /proc/pid/statm file, which has a field for the file
mappings (again, including shmem) and total memory occupied by these
mappings (i.e. equivalent to VmRSS in the <...>/status file. Getting
the value for anonymous mappings only is thus not exactly user-friendly
(the statm file is intended to be rather efficiently machine-readable).
To address both of these shortcomings, this patch adds a breakdown of
VmRSS in /proc/<pid>/status via new fields RssAnon, RssFile and
RssShmem, making use of the previous preparatory patch. These fields
tell the user the memory occupied by private anonymous pages, mapped
regular files and shmem, respectively. Other existing fields in /status
and /statm files are left without change. The /statm file can be
extended in the future, if there's a need for that.
Example (part of) /proc/pid/status output including the new Rss* fields:
VmPeak: 2001008 kB
VmSize: 2001004 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 5108 kB
VmRSS: 5108 kB
RssAnon: 92 kB
RssFile: 1324 kB
RssShmem: 3692 kB
VmData: 192 kB
VmStk: 136 kB
VmExe: 4 kB
VmLib: 1784 kB
VmPTE: 3928 kB
VmPMD: 20 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
[vbabka@suse.cz: forward-porting, tweak changelog]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for
shmem-backed mappings, even if the mapped portion does contain pages
that were swapped out. This is because unlike private anonymous
mappings, shmem does not change pte to swap entry, but pte_none when
swapping the page out. In the smaps page walk, such page thus looks
like it was never faulted in.
This patch changes smaps_pte_entry() to determine the swap status for
such pte_none entries for shmem mappings, similarly to how
mincore_page() does it. Swapped out shmem pages are thus accounted for.
For private mappings of tmpfs files that COWed some of the pages, swaped
out status of the original shmem pages is naturally ignored. If some of
the private copies was also swapped out, they are accounted via their
page table swap entries, so the resulting reported swap usage is then a
sum of both swapped out private copies, and swapped out shmem pages that
were not COWed. No double accounting can thus happen.
The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in
question never accessed, but another process populated them and then let
them become swapped out. I believe it is still less confusing and
subtle than not showing any swap usage by shmem mappings at all.
Swapped out counter might of interest of users who would like to prevent
from future swapins during performance critical operation and pre-fault
them at their convenience. Especially for larger swapped out regions
the cost of swapin is much higher than a fresh page allocation. So a
differentiation between pte_none vs. swapped out is important for those
usecases.
One downside of this patch is that it makes /proc/pid/smaps more
expensive for shmem mappings, as we consult the radix tree for each
pte_none entry, so the overal complexity is O(n*log(n)). I have
measured this on a process that creates a 2GB mapping and dirties single
pages with a stride of 2MB, and time how long does it take to cat
/proc/pid/smaps of this process 100 times.
Private anonymous mapping:
real 0m0.949s
user 0m0.116s
sys 0m0.348s
Mapping of a /dev/shm/file:
real 0m3.831s
user 0m0.180s
sys 0m3.212s
The difference is rather substantial, so the next patch will reduce the
cost for shared or read-only mappings.
In a less controlled experiment, I've gathered pids of processes on my
desktop that have either '/dev/shm/*' or 'SYSV*' in smaps. This
included the Chrome browser and some KDE processes. Again, I've run cat
/proc/pid/smaps on each 100 times.
Before this patch:
real 0m9.050s
user 0m0.518s
sys 0m8.066s
After this patch:
real 0m9.221s
user 0m0.541s
sys 0m8.187s
This suggests low impact on average systems.
Note that this patch doesn't attempt to adjust the SwapPss field for
shmem mappings, which would need extra work to determine who else could
have the pages mapped. Thus the value stays zero except for COWed
swapped out pages in a shmem mapping, which are accounted as usual.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:
There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even though
their implications on memory usage are quite different: at reclaim, file
mapping can be dropped or written back on disk while shmem needs a place
in swap. As for shmem pages that are swapped-out or in swap cache, they
aren't accounted at all.
The original motivation for myself is that a customer found (IMHO
rightfully) confusing that e.g. top output for process swap usage is
unreliable with respect to swapped out shmem pages, which are not
accounted for.
The fundamental difference between private anonymous and shmem pages is
that the latter has PTE's converted to pte_none, and not swapents. As
such, they are not accounted to the number of swapents visible e.g. in
/proc/pid/status VmSwap row. It might be theoretically possible to use
swapents when swapping out shmem (without extra cost, as one has to
change all mappers anyway), and on swap in only convert the swapent for
the faulting process, leaving swapents in other processes until they
also fault (so again no extra cost). But I don't know how many
assumptions this would break, and it would be too disruptive change for
a relatively small benefit.
Instead, my approach is to document the limitation of VmSwap, and
provide means to determine the swap usage for shmem areas for those who
are interested and willing to pay the price, using /proc/pid/smaps.
Because outside of ipcs, I don't think it's possible to currently to
determine the usage at all. The previous patchset [1] did introduce new
shmem-specific fields into smaps output, and functions to determine the
values. I take a simpler approach, noting that smaps output already has
a "Swap: X kB" line, where currently X == 0 always for shmem areas. I
think we can just consider this a bug and provide the proper value by
consulting the radix tree, as e.g. mincore_page() does. In the patch
changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.
The last two patches are adapted from Jerome's patchset and provide a
VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status.
Hugh noted that this is a welcome addition, and I agree that it might
help e.g. debugging process memory usage at albeit non-zero, but still
rather low cost of extra per-mm counter and some page flag checks.
[1] http://lwn.net/Articles/611966/
This patch (of 6):
The documentation for /proc/pid/status does not mention that the value
of VmSwap counts only swapped out anonymous private pages, and not
swapped out pages of the underlying shmem objects (for shmem mappings).
This is not obvious, so document this limitation.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inode_nohighmem() is sufficient to make sure that page_get_link()
won't try to allocate a highmem page. Moreover, it is sufficient
to make sure that page_symlink/__page_symlink won't do the same
thing. However, any filesystem that manually preseeds the symlink's
page cache upon symlink(2) needs to make sure that the page it
inserts there won't be a highmem one.
Fortunately, only nfs and shmem have run afoul of that...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The site for libhugetlbfs has moved from sourceforge to github. This
commit updates the old url.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull f2fs updates from Jaegeuk Kim:
"This series adds two ioctls to control cached data and fragmented
files. Most of the rest fixes missing error cases and bugs that we
have not covered so far. Summary:
Enhancements:
- support an ioctl to execute online file defragmentation
- support an ioctl to flush cached data
- speed up shrinking of extent_cache entries
- handle broken superblock
- refector dirty inode management infra
- revisit f2fs_map_blocks to handle more cases
- reduce global lock coverage
- add detecting user's idle time
Major bug fixes:
- fix data race condition on cached nat entries
- fix error cases of volatile and atomic writes"
* tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits)
f2fs: should unset atomic flag after successful commit
f2fs: fix wrong memory condition check
f2fs: monitor the number of background checkpoint
f2fs: detect idle time depending on user behavior
f2fs: introduce time and interval facility
f2fs: skip releasing nodes in chindless extent tree
f2fs: use atomic type for node count in extent tree
f2fs: recognize encrypted data in f2fs_fiemap
f2fs: clean up f2fs_balance_fs
f2fs: remove redundant calls
f2fs: avoid unnecessary f2fs_balance_fs calls
f2fs: check the page status filled from disk
f2fs: introduce __get_node_page to reuse common code
f2fs: check node id earily when readaheading node page
f2fs: read isize while holding i_mutex in fiemap
Revert "f2fs: check the node block address of newly allocated nid"
f2fs: cover more area with nat_tree_lock
f2fs: introduce max_file_blocks in sbi
f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
f2fs: introduce zombie list for fast shrinking extent trees
...
- I'm assisting Joel as co-maintainer and patch monkey now, and you will
see pull reuquests from me for a while
- Besides the MAINTAINERS update there is just a single change, which
adds support for binary attributes to configfs, which are very similar
to the sysfs binary attributes. Thanks to Pantelis Antoniou!
- You will see another actually bigger set of configfs changes in the
SCSI target pull from Nic - those were merged before this new tree
even existed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJWlVY2AAoJEA+eU2VSBFGD7usQAMtzTCqxKH9imqMle0434tE0
OSY/CEi8sE+0m+u0p0rszi9G3Mneyx6Gsm1enuNUfsKDMTkdhTggVbU1IaLzGddg
ZJbjQwJXgSC4bRO+I1AgCXlJmSLVRWhw+adUqqtKuL5L8tD08vUCXz+52nQOIlh2
uP1PYz4Y5JyWQDviv0S6pXN53yf5y60P4eE6ghfnP9VvaD7fwIYIxMfxUQvKsvaS
z6Aab/XrfkmoQK2A8y21LA4nHA6TS5Aa7IrqCp7G3Nks/An1qZIMU8bZpc+HfpX+
lRd6TU4mHhabs3cOJBELV7ZxCotTSh5XhHyQaDBMo3BDCxl7UNGdY8+V7Xx5xK6D
RHdv9g+ObUotAAfd3NXCV1fiUzUaeInRmolvLYr/o+ftPXVL7C80HerAXZHeiQXx
u9VEkb6tTcLLoxAoIPTJ6CogSWfvYTpocoSmaLaJspKQ8859BoYsgDCjkGQJOnZP
hngof5adafoQEhOmn6xi8NrWD906HhOHS0n1gwC4ho65yZbqf13gA+ATsavM5oX4
3rwJgY4xeQJ1YN7Q89QgdO9ecrDod+Yq2HVkfj/LeMuXJKpCMkbYda6Qyg0t4jpF
XLKLfXF/3BhzOvnLwh6pwQxV9F4HwBS3lw8fzw7eNua3pRoyQWGmzuFRoGA5HW5k
L3KheVPNNYKpyeAKmoDF
=FjkZ
-----END PGP SIGNATURE-----
Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs
Pull configfs updates from Christoph Hellwig:
"I'm assisting Joel as co-maintainer and patch monkey now, and you will
see pull reuquests from me for a while.
Besides the MAINTAINERS update there is just a single change, which
adds support for binary attributes to configfs, which are very similar
to the sysfs binary attributes. Thanks to Pantelis Antoniou!
You will see another actually bigger set of configfs changes in the
SCSI target pull from Nic - those were merged before this new tree
even existed"
* tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs:
configfs: add myself as co-maintainer, updated git tree
configfs: implement binary attributes
ConfigFS lacked binary attributes up until now. This patch
introduces support for binary attributes in a somewhat similar
manner of sysfs binary attributes albeit with changes that
fit the configfs usage model.
Problems that configfs binary attributes fix are everything that
requires a binary blob as part of the configuration of a resource,
such as bitstream loading for FPGAs, DTBs for dynamically created
devices etc.
Look at Documentation/filesystems/configfs/configfs.txt for internals
and howto use them.
This patch is against linux-next as of today that contains
Christoph's configfs rework.
Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
[hch: folded a fix from Geert Uytterhoeven <geert+renesas@glider.be>]
[hch: a few tiny updates based on review feedback]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a new option 'data_flush' to enable data flush functionality.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.
It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.
new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch fix some typos in Documentation/filesystems/f2fs.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull SCSI target updates from Nicholas Bellinger:
"This series contains HCH's changes to absorb configfs attribute
->show() + ->store() function pointer usage from it's original
tree-wide consumers, into common configfs code.
It includes usb-gadget, target w/ drivers, netconsole and ocfs2
changes to realize the improved simplicity, that now renders the
original include/target/configfs_macros.h CPP magic for fabric drivers
and others, unnecessary and obsolete.
And with common code in place, new configfs attributes can be added
easier than ever before.
Note, there are further improvements in-flight from other folks for
v4.5 code in configfs land, plus number of target fixes for post -rc1
code"
In the meantime, a new user of the now-removed old configfs API came in
through the char/misc tree in commit 7bd1d4093c ("stm class: Introduce
an abstraction for System Trace Module devices").
This merge resolution comes from Alexander Shishkin, who updated his stm
class tracing abstraction to account for the removal of the old
show_attribute and store_attribute methods in commit 517982229f
("configfs: remove old API") from this pull. As Alexander says about
that patch:
"There's no need to keep an extra wrapper structure per item and the
awkward show_attribute/store_attribute item ops are no longer needed.
This patch converts policy code to the new api, all the while making
the code quite a bit smaller and easier on the eyes.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>"
That patch was folded into the merge so that the tree should be fully
bisectable.
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
configfs: remove old API
ocfs2/cluster: use per-attribute show and store methods
ocfs2/cluster: move locking into attribute store methods
netconsole: use per-attribute show and store methods
target: use per-attribute show and store methods
spear13xx_pcie_gadget: use per-attribute show and store methods
dlm: use per-attribute show and store methods
usb-gadget/f_serial: use per-attribute show and store methods
usb-gadget/f_phonet: use per-attribute show and store methods
usb-gadget/f_obex: use per-attribute show and store methods
usb-gadget/f_uac2: use per-attribute show and store methods
usb-gadget/f_uac1: use per-attribute show and store methods
usb-gadget/f_mass_storage: use per-attribute show and store methods
usb-gadget/f_sourcesink: use per-attribute show and store methods
usb-gadget/f_printer: use per-attribute show and store methods
usb-gadget/f_midi: use per-attribute show and store methods
usb-gadget/f_loopback: use per-attribute show and store methods
usb-gadget/ether: use per-attribute show and store methods
usb-gadget/f_acm: use per-attribute show and store methods
usb-gadget/f_hid: use per-attribute show and store methods
...
wait; these include some improvements to the suggestions for email clients
and patch submission.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWRge3AAoJEI3ONVYwIuV6BboP/30p1d0kGC9KWH96aVZst1uu
4VaUcOf+zKp8wQwhyHQIgmGlgD8u6fzfa8i2YIiVRzJD18Tm67KjHsbSiT4qgI84
xizcmnuRogTthtWTmGITKfgQ5OL3Z0IX/5JgIMoIdmAKSjg/3kSR2b5FvN+tj6Qh
9SQAWYIW2cvDWp9PV8gWOdA2j6EKDQ6BdwRE749LmS3MXRN9bM/KRezCyIrmCvOF
FChTzxVjKd2TNYlO9nDyzn70WhyoV/e7jn+9AOKC4XHOCrFI0KB36mE8Wv/VmUmF
+6YPLWNJrTb7J07p9fkYb0FeMzEVcZIVqtMgvtmWTrX4qat9VJI38tOcgDQKHOhj
ETf+8DDM0t4pNUw+3KDdCdu+AIZcEuuRQYc7AUC/URHZGM+LsI0hERRcNS5Z/KRv
lKyq3Y+A+/Z7b6Ia87lJavgubEdY3pagCOZXWQWvb8CwaTnOhuf4qfBK6pkqJBuN
2DfNSQrsi2t1cqKfrnGr58kkMslOKzZLXXSjgvB8IhHmqedNha9JODykq2ldnMdx
Lch1fIe8Exh+p0pY3nKUnmB69dMZnuNVnW92C4JbNr+zgU1jxQeLVer3bum3NzZw
cEASmIPb1nJnl5aJkapc/rz+tSJ9GIIgaTyarNO+cVWQ712gYU5Ot4nNj5ayKW5C
RgQOqBL+68GF1tMUB61b
=6qJ0
-----END PGP SIGNATURE-----
Merge tag '4.4-additional' of git://git.lwn.net/linux
Pull more documentation updates from Jon Corbet:
"A few more documentation patches that wandered in and have no reason
to wait; these include some improvements to the suggestions for email
clients and patch submission"
* tag '4.4-additional' of git://git.lwn.net/linux:
Documentation: Add minimal Mutt config for using Gmail
Documentation: Add note on sending files directly with Mutt
Documentation: dontdiff: remove media from dontdiff
Documentation/SubmittingPatches: discuss In-Reply-To
Remove email address from Documentation/filesystems/overlayfs.txt
can-doc: Add missing semicolon to example
I'm getting a surprising large number of questions about overlayfs sent
to me personally, rather than to a relevant mailing list.
So remove my email address from the documentation, and add a note
about looking in the MAINTAINERS file.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.
2/ Teach the coredump implementation how to filter out DAX mappings.
3/ Introduce NUMA hints for allocations made by the pmem driver, and as
a side effect all devm allocations now hint their NUMA node by
default.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
+rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
EmttaYrDr2jJwIaGyw+H
=a2/L
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"Outside of the new ACPI-NFIT hot-add support this pull request is more
notable for what it does not contain, than what it does. There were a
handful of development topics this cycle, dax get_user_pages, dax
fsync, and raw block dax, that need more more iteration and will wait
for 4.5.
The patches to make devm and the pmem driver NUMA aware have been in
-next for several weeks. The hot-add support has not, but is
contained to the NFIT driver and is passing unit tests. The coredump
support is straightforward and was looked over by Jeff. All of it has
received a 0day build success notification across 107 configs.
Summary:
- Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.
- Teach the coredump implementation how to filter out DAX mappings.
- Introduce NUMA hints for allocations made by the pmem driver, and
as a side effect all devm allocations now hint their NUMA node by
default"
* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
coredump: add DAX filtering for FDPIC ELF coredumps
coredump: add DAX filtering for ELF coredumps
acpi: nfit: Add support for hot-add
nfit: in acpi_nfit_init, break on a 0-length table
pmem, memremap: convert to numa aware allocations
devm_memremap_pages: use numa_mem_id
devm: make allocations numa aware by default
devm_memremap: convert to return ERR_PTR
devm_memunmap: use devres_release()
pmem: kill memremap_pmem()
x86, mm: quiet arch_add_memory()
Here is a list of patches we've accumulated for GFS2 for the current upstream
merge window. There are only six patches this time:
1. A cleanup patch from Andreas to remove the gl_spin #define in favor
of its value for the sake of clarity.
2. A fix from Andy Price to mark the inode dirty during fallocate.
3. A fix from Andy Price to set s_mode on mount failures to prevent
a stack trace.
4. A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
due to uninitialized storage.
5. A patch from me to protecting our freeing of the in-core directory
hash table to prevent double-free.
6. A fix for a page/block rounding problem that resulted in a metadata
coherency problem when the block size != page size.
I've got a lot more patches in various stages of review and testing,
but I'm afraid they'll have to wait until the next merge window. So
next time we're likely to have a lot more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWQMSyAAoJENeLYdPf93o7k+EH/inFFqkYxLCyXZngihTHdZvS
tYAwYxPJw6UgSrZ1dY6iwcmhy6YgT7a98RJdPA3Kj0SvJxQVBiJ5uc0VKpK0bj72
l7pVPkMEWCHs8u8RAIGfnik8y6IxOP35+EN0U/3ZLMG1Gc+Tmq9M8KLlnhfX980q
oaniaDJAUaSSW8RxD2AKabxjoJ0DKnE6MtDHsL/JWhp1j5co5BbbwOzBmBa2mLCI
RQ8YEvjqjtgm91g33pkxXJVMjAkFqLjRSVfomd5MSQWRUb+eGIpd3LFThHzVfm55
2f4j2kd2V4i7rTrh8Q3RdoYRoFgpXyxXQ3R2UYL59b7B2DvGTyAKyDxfU237ZXQ=
=yanT
-----END PGP SIGNATURE-----
Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"Here is a list of patches we've accumulated for GFS2 for the current
upstream merge window. There are only six patches this time:
1. A cleanup patch from Andreas to remove the gl_spin #define in favor
of its value for the sake of clarity.
2. A fix from Andy Price to mark the inode dirty during fallocate.
3. A fix from Andy Price to set s_mode on mount failures to prevent a
stack trace.
4 A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
due to uninitialized storage.
5. A patch from me to protecting our freeing of the in-core directory
hash table to prevent double-free.
6. A fix for a page/block rounding problem that resulted in a metadata
coherency problem when the block size != page size"
I've got a lot more patches in various stages of review and testing,
but I'm afraid they'll have to wait until the next merge window. So
next time we're likely to have a lot more"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
GFS2: Fix rgrp end rounding problem for bsize < page size
GFS2: Protect freeing directory hash table with i_lock spin_lock
gfs2: Remove gl_spin define
gfs2: Add missing else in trans_add_meta/data
GFS2: Set s_mode before parsing mount options
GFS2: fallocate: do not rely on file_update_time to mark the inode dirty
Add two new flags to the existing coredump mechanism for ELF files to
allow us to explicitly filter DAX mappings. This is desirable because
DAX mappings, like hugetlb mappings, have the potential to be very
large.
Update the coredump_filter documentation in
Documentation/filesystems/proc.txt so that it addresses the new DAX
coredump flags. Also update the documented default value of
coredump_filter to be consistent with the core(5) man page. The
documentation being updated talks about bit 4, Dump ELF headers, which
is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
kernel config. This kernel config option defaults to "y" if both ELF
binaries and coredump are enabled.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Merge patch-bomb from Andrew Morton:
- inotify tweaks
- some ocfs2 updates (many more are awaiting review)
- various misc bits
- kernel/watchdog.c updates
- Some of mm. I have a huge number of MM patches this time and quite a
lot of it is quite difficult and much will be held over to next time.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
selftests: vm: add tests for lock on fault
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
mm: introduce VM_LOCKONFAULT
mm: mlock: add new mlock system call
mm: mlock: refactor mlock, munlock, and munlockall code
kasan: always taint kernel on report
mm, slub, kasan: enable user tracking by default with KASAN=y
kasan: use IS_ALIGNED in memory_is_poisoned_8()
kasan: Fix a type conversion error
lib: test_kasan: add some testcases
kasan: update reference to kasan prototype repo
kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
kasan: various fixes in documentation
kasan: update log messages
kasan: accurately determine the type of the bad access
kasan: update reported bug types for kernel memory accesses
kasan: update reported bug types for not user nor kernel memory accesses
mm/kasan: prevent deadlock in kasan reporting
mm/kasan: don't use kasan shadow pointer in generic functions
mm/kasan: MODULE_VADDR is not available on all archs
...
There's an odd line about "Locked" at the head of the description of
/proc/meminfo: it seems to have strayed from /proc/PID/smaps, so lead it
back there. Move "Swap" and "SwapPss" descriptions down above it, to
match the order in the file (though "PageSize"s still undescribed).
The example of "Locked: 374 kB" (the same as Pss, neither Rss nor Size) is
so unlikely as to be misleading: just make it 0, this is /bin/bash text;
which would be "dw" (disabled write) not "de" (do not expand).
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While updating some mm Documentation, I came across a few straggling
references to the non-linear vmas which were happily removed in v4.0.
Delete them.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there's no easy way to get per-process usage of hugetlb pages,
which is inconvenient because userspace applications which use hugetlb
typically want to control their processes on the basis of how much memory
(including hugetlb) they use. So this patch simply provides easy access
to the info via /proc/PID/status.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Joern Engel <joern@logfs.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
some new CAN driver documentation. Beyond that, we have kernel-doc fixes,
a bit more work to support reproducible builds, and the usual collection of
small fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWO6HiAAoJEI3ONVYwIuV6ihwQAK0KC72h0706bdwDJ1p1/aJU
QLuPeiKYWgGAXq2zgOyw3Povj4bkMwkiq1IGHLyK0Id4tg3ngxOXjimk4YKrqarI
BD5HdpOm7IyQEe66ZU9b1RFDVst+bg3yp6ZIZsH5vQxl/KnyJ6AyaaDk8TPYId8S
1+CykJzxyi7GyT/jlLpHbKtBKrraoVke+cNPMAvOf0NjSyO7Ix5B+qH50sttG6Eu
9qcQ8hlKXOdZRTiGW6P+jeZNA+e5+CRpnG9VHBquHy4lI85kQThhWq41UMH690PP
eRbLipeUybb0FwW2KwuMjGKEMDkMvrGJh0TzSXX9lGHd+5/41v7zcyKh8vJcpLjh
bNQ2WOAKUBd2d15EP1MNoKXDLGJXusJczLwOjigWiSCQvgouAWwMrpWEw+Obv8Yl
rdoH1oQqDFfDnk6mnKrSaqLWGNuLxDtkEl/1P0jsGSK6lM3FDkOgTuNPYXTJJgxN
rXuGmPhyUlS2srERUeQJw2rISN0WRBvcKJGkMX6IpvrXHkItbelqK+yY1DeKPmcm
qgbIx9ZWNqtltFpG22VVByqAVwucO5Nu8cAIQ2ysJsTnKOvQCQmhu5UKTjBCkEJM
VpeMm32BfNiJFLuLTQGWBZ8bkRl2shQyXhOaR3uyqG4T+rpPD3qJi6dtFRpsAzOB
q1nZuJCpOaxJFzjSKvpJ
=emZ7
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux
Pull documentation update from Jon Corbet:
"There is a nice new document from Neil on how pathname lookups work
and some new CAN driver documentation. Beyond that, we have
kernel-doc fixes, a bit more work to support reproducible builds, and
the usual collection of small fixes"
* tag 'docs-for-linus' of git://git.lwn.net/linux: (34 commits)
Documentation: add new description of path-name lookup.
Documentation/vm/slub.txt: document slabinfo-gnuplot.sh
Doc: ABI/stable: Fix typo in ABI/stable
doc: Clarify that nmi_watchdog param is for hardlockups
Typo correction for description in gpio document.
DocBook: Fix kernel-doc to be case-insensitive for private:
kernel-docs.txt: update kernelnewbies reference
Doc:kvm: Fix typo in Doc/virtual/kvm
Documentation/Changes: Add bc in "Current Minimal Requirements" section
Documentation/email-clients.txt: remove trailing whitespace
DocBook: Use a fixed encoding for output
MAINTAINERS: The docs tree has moved
Docs/kernel-parameters: Add earlycon devicetree usage
SubmittingPatches: make Subject examples match the de facto standard
Documentation: gpio: mention that <function>-gpio has been deprecated
Documentation: cgroups: just fix a few typos
Documentation: Update kselftest.txt
Documentation: DMA API: Be more explicit that nents is always the same
Documentation: Update the default value of crashkernel low
zram: update documentation
...
Pull f2fs updates from Jaegeuk Kim:
"Most part of the patches include enhancing the stability and
performance of in-memory extent caches feature.
In addition, it introduces several new features and configurable
points:
- F2FS_GOING_DOWN_METAFLUSH ioctl to test power failures
- F2FS_IOC_WRITE_CHECKPOINT ioctl to trigger checkpoint by users
- background_gc=sync mount option to do gc synchronously
- periodic checkpoints
- sysfs entry to control readahead blocks for free nids
And the following bug fixes have been merged.
- fix SSA corruption by collapse/insert_range
- correct a couple of gc behaviors
- fix the results of f2fs_map_blocks
- fix error case handling of volatile/atomic writes"
* tag 'for-f2fs-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (54 commits)
f2fs: fix to skip shrinking extent nodes
f2fs: fix error path of ->symlink
f2fs: fix to clear GCed flag for atomic written page
f2fs: don't need to submit bio on error case
f2fs: fix leakage of inmemory atomic pages
f2fs: refactor __find_rev_next_{zero}_bit
f2fs: support fiemap for inline_data
f2fs: flush dirty data for bmap
f2fs: relocate the tracepoint for background_gc
f2fs crypto: fix racing of accessing encrypted page among
f2fs: export ra_nid_pages to sysfs
f2fs: readahead for free nids building
f2fs: support lower priority asynchronous readahead in ra_meta_pages
f2fs: don't tag REQ_META for temporary non-meta pages
f2fs: add a tracepoint for f2fs_read_data_pages
f2fs: set GFP_NOFS for grab_cache_page
f2fs: fix SSA updates resulting in corruption
Revert "f2fs: do not skip dentry block writes"
f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure
f2fs: merge meta writes as many possible
...
Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
=g7ZO
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
of debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time"
* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Add debugfs_create_ulong()
of: to support binding numa node to specified device in devicetree
debugfs: Add read-only/write-only bool file ops
debugfs: Add read-only/write-only size_t file ops
debugfs: Add read-only/write-only x64 file ops
debugfs: Consolidate file mode checks in debugfs_create_*()
Revert "mm: Check if section present during memory block (un)registering"
driver-core: platform: Provide helpers for multi-driver modules
mm: Check if section present during memory block (un)registering
devres: fix a for loop bounds check
CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
base/platform: assert that dev_pm_domain callbacks are called unconditionally
sysfs: correctly handle short reads on PREALLOC attrs.
base: soc: siplify ida usage
kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
kobject: explain what kobject's sd field is
debugfs: document that debugfs_remove*() accepts NULL and error values
debugfs: Pass bool pointer to debugfs_create_bool()
ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
Pull networking updates from David Miller:
Changes of note:
1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.
2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
David Ahern.
3) Allow the user to ask for the statistics to be filtered out of
ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.
4) More work to pass the network namespace context around deep into
various packet path APIs, starting with the netfilter hooks. From
Eric W Biederman.
5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
Richter.
6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.
7) Support Very High Throughput in wireless MESH code, from Bob
Copeland.
8) Allow setting the ageing_time in switchdev/rocker. From Scott
Feldman.
9) Properly autoload L2TP type modules, from Stephen Hemminger.
10) Fix and enable offload features by default in 8139cp driver, from
David Woodhouse.
11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
Jiri Benc.
12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
Opstad.
13) Fix IPSEC flowcache overflows on large systems, from Steffen
Klassert.
14) Convert bridging to track VLANs using rhashtable entries rather than
a bitmap. From Nikolay Aleksandrov.
15) Make TCP listener handling completely lockless, this is a major
accomplishment. Incoming request sockets now live in the
established hash table just like any other socket too.
From Eric Dumazet.
15) Provide more bridging attributes to netlink, from Nikolay
Aleksandrov.
16) Use hash based algorithm for ipv4 multipath routing, this was very
long overdue. From Peter Nørlund.
17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.
18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.
19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
influences the port binding selection logic used by SO_REUSEPORT.
20) Add ipv6 support to VRF, from David Ahern.
21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.
22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.
23) Implement RACK loss recovery in TCP, from Yuchung Cheng.
24) Support multipath routes in MPLS, from Roopa Prabhu.
25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
Dumazet.
26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
Sudarsana Kalluru.
27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
Sowa.
28) Support ipv6 geneve tunnels, from John W Linville.
29) Add flood control support to switchdev layer, from Ido Schimmel.
30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
Hannes Frederic Sowa.
31) Support persistent maps and progs in bpf, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
sh_eth: use DMA barriers
switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
net: sched: kill dead code in sch_choke.c
irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
net: dsa: mv88e6xxx: include DSA ports in VLANs
net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
net/core: fix for_each_netdev_feature
vlan: Invoke driver vlan hooks only if device is present
arcnet/com20020: add LEDS_CLASS dependency
bpf, verifier: annotate verbose printer with __printf
dp83640: Only wait for timestamps for packets with timestamping enabled.
ptp: Change ptp_class to a proper bitmask
dp83640: Prune rx timestamp list before reading from it
dp83640: Delay scheduled work.
dp83640: Include hash in timestamp/packet matching
ipv6: fix tunnel error handling
net/mlx5e: Fix LSO vlan insertion
net/mlx5e: Re-eanble client vlan TX acceleration
net/mlx5e: Return error in case mlx5e_set_features() fails
net/mlx5e: Don't allow more than max supported channels
...
This document is based on three recent lwn.net articles.
Some of the introductory material and linkage between articles
has been removed, and some time-based descriptions have been
revised.
Also all links to code have been removed as the code is very close by.
Contains corrections and improvements from Randy Dunlap <rdunlap@infradead.org>.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
gl_lockref). Remove that define to make the references to gl_lockref.lock more
obvious.
Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
A dhcp server may provide parameters to a client from a pool of IP
addresses and using a shared rootfs, or provide a specific set of
parameters for a specific client, usually using the MAC address to
identify each client individually. The dhcp protocol also specifies
a client-id field which can be used to determine the correct
parameters to supply when no MAC address is available. There is
currently no way to tell the kernel to supply a specific client-id,
only the userspace dhcp clients support this feature, but this can
not be used when the network is needed before userspace is available
such as when the root filesystem is on NFS.
This patch is to be able to do something like "ip=dhcp,client_id_type,
client_id_value", as a kernel parameter to enable the kernel to
identify itself to the server.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the old show_attribute and store_attribute methods and update
the documentation. Also replace the two C samples with a single new
one in the proper samples directory where people expect to find it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
when all it needs is a boolean pointer.
It would be better to update this API to make it accept 'bool *'
instead, as that will make it more consistent and often more convenient.
Over that bool takes just a byte.
That required updates to all user sites as well, in the same commit
updating the API. regmap core was also using
debugfs_{read|write}_file_bool(), directly and variable types were
updated for that to be bool as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:
seq_put_decimal_ull(m, ' ', wchan);
The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:
static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
unsigned long wchan;
char symname[KSYM_NAME_LEN];
wchan = get_wchan(task);
if (lookup_symbol_name(wchan, symname) < 0) {
if (!ptrace_may_access(task, PTRACE_MODE_READ))
return 0;
seq_printf(m, "%lu", wchan);
} else {
seq_printf(m, "%s", symname);
}
return 0;
}
This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:
fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
ffffffff8123b380
Most real-life uses of wchan are symbolic:
ps -eo pid:10,tid:10,wchan:30,comm
and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:
triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
open("/proc/30833/wchan", O_RDONLY) = 6
There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).
These days there appears to be very little legitimate reason
user-space would be interested in the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.
So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.
( The absolute sleep address can generally still be profiled via
perf, by tasks with sufficient privileges. )
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without knowing this, the use of sysfs_streq() becomes puzzling.
The termination happens in kernfs_fop_write().
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
[jc: moved the new text to a different paragraph]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
- sysfs_dirent is now kernfs_node - see commit 324a56e16e ("kernfs:
s/sysfs_dirent/kernfs_node/ and rename its friends accordingly")
- sysfs_super_info is now kernfs_super_info - see commit c525aaddc3
("kernfs: s/sysfs/kernfs/ in various data structures")
- the 's_' prefix was dropped from various fields - see
commit adc5e8b58f ("kernfs: drop s_ prefix from kernfs_node members")
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
sysfs_dirent went away when kernfs was extracted from sysfs. The reference
to the kobject now lives in a kernfs_node (in the 'priv' member).
See commit 324a56e16e ("kernfs: s/sysfs_dirent/kernfs_node/ and rename
its friends accordingly").
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Merge second patch-bomb from Andrew Morton:
"Almost all of the rest of MM. There was an unusually large amount of
MM material this time"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
zpool: remove no-op module init/exit
mm: zbud: constify the zbud_ops
mm: zpool: constify the zpool_ops
mm: swap: zswap: maybe_preload & refactoring
zram: unify error reporting
zsmalloc: remove null check from destroy_handle_cache()
zsmalloc: do not take class lock in zs_shrinker_count()
zsmalloc: use class->pages_per_zspage
zsmalloc: consider ZS_ALMOST_FULL as migrate source
zsmalloc: partial page ordering within a fullness_list
zsmalloc: use shrinker to trigger auto-compaction
zsmalloc: account the number of compacted pages
zsmalloc/zram: introduce zs_pool_stats api
zsmalloc: cosmetic compaction code adjustments
zsmalloc: introduce zs_can_compact() function
zsmalloc: always keep per-class stats
zsmalloc: drop unused variable `nr_to_migrate'
mm/memblock.c: fix comment in __next_mem_range()
mm/page_alloc.c: fix type information of memoryless node
memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
...
We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory
efficiency so workingset includes swap as well as RSS.
On such system, if there are lots of shared anonymous pages, it's really
hard to figure out exactly how many each process consumes memory(ie, rss
+ wap) if the system has lots of shared anonymous memory(e.g, android).
This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
more exact workingset size per process.
Bongkyu tested it. Result is below.
1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB
$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184
2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB
$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744
[akpm@linux-foundation.org: simplify kunmap_atomic() call]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Bongkyu Kim <bongkyu.kim@lge.com>
Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the support code for DAX-enabled filesystems to allow them to
provide huge pages in response to faults.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
- Add Jeff Layton as an nfsd co-maintainer: no change to
existing practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by
the state locking rewrite, causing a crash noticed by multiple
users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6fbLAAoJECebzXlCjuG+qGkP/j2YnZynwqCa4uz1+FU7qfYI
kZWNGFFQ7O7e1i9Wznp7BkSA020rvM5d1HPwZhtstURM3i52XWRtbppwKF2+IuEU
tpNdPKb28BPCZO29Z8mQk9IS2sX5jmBiibXRqBk0VK7e43PXrIwg1LJJ9HOfOpLh
b1MvxdEB7vqK+fAVIYyhlg0UDd5AHAkQ+vS8YuohRXbDcsdhhE4vmusLlUl5UKp8
5Yunz+b+pXfXPYaKidmpar6U2KoRSTPP1uO3bNfN6URO1W1nchPadLs0DnsBKlhb
U8II5RZEmc+YfiIMoeptkJHoNhWT6Zu7CNJR6B0USTKv4L6TmFQVpxptVutzYVwx
sGJ65lvCiXXOPz8JJwvBty//HTmbyOiCm64/vMbhQRlSNLSmcmTXEpw/uT5Huaxx
bX9lnznoVVCd3eRoXPwMdZTbg/uEKqREZsQWVoqA6gexYqeyp79kvGbttLoUJ27Z
IjtNb9W6akxfPKrHMgan6j7dy866o6TdSfWRayHwUoswbNnVOnMYKHjApOtF0oev
k2pdLuy9tjl2a9Ow9sSwHZDbNsXgJO76E0aYnSTBP/YvctlG7KoZ+E0oxa6DWTC+
0dE+g1xhIuUtW5WRL4pfWWk1G7jnf16J91bKkn91VveDn666RncAbLBtePmpIcIu
5Ah6KxztTVCW++i5pmHh
=aecc
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Nothing major, but:
- Add Jeff Layton as an nfsd co-maintainer: no change to existing
practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by the
state locking rewrite, causing a crash noticed by multiple users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues"
* tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
nfsd: deal with DELEGRETURN racing with CB_RECALL
nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
nfsd: ensure that delegation stateid hash references are only put once
nfsd: ensure that the ol stateid hash reference is only put once
net: sunrpc: fix tracepoint Warning: unknown op '->'
nfsd: allow more than one laundry job to run at a time
nfsd: don't WARN/backtrace for invalid container deployment.
fs: fix fs/locks.c kernel-doc warning
nfsd: Add Jeff Layton as co-maintainer
NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
NFSD: Store parent's stat in a separate value
nfsd: Fix two typos in comments
lockd: NLM grace period shouldn't block NFSv4 opens
nfsd: include linux/nfs4.h in export.h
sunrpc: Switch to using hash list instead single list
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Store cache_detail in seq_file's private directly
...
Pull f2fs updates from Jaegeuk Kim:
"The major work includes fixing and enhancing the existing extent_cache
feature, which has been well settling down so far and now it becomes a
default mount option accordingly.
Also, this version newly registers a f2fs memory shrinker to reclaim
several objects consumed by a couple of data structures in order to
avoid memory pressures.
Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
triggers a cleaning job explicitly by users.
Most of the other patches are to fix bugs occurred in the corner cases
across the whole code area"
* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
f2fs: upset segment_info repair
f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
f2fs: update extent tree in batches
f2fs: fix to release inode correctly
f2fs: handle f2fs_truncate error correctly
f2fs: avoid unneeded initializing when converting inline dentry
f2fs: atomically set inode->i_flags
f2fs: fix wrong pointer access during try_to_free_nids
f2fs: use __GFP_NOFAIL to avoid infinite loop
f2fs: lookup neighbor extent nodes for merging later
f2fs: split __insert_extent_tree_ret for readability
f2fs: kill dead code in __insert_extent_tree
f2fs: adjust showing of extent cache stat
f2fs: add largest/cached stat in extent cache
f2fs: fix incorrect mapping for bmap
f2fs: add annotation for space utilization of regular/inline dentry
f2fs: fix to update cached_en of extent tree properly
f2fs: fix typo
f2fs: check the node block address of newly allocated nid
f2fs: go out for insert_inode_locked failure
...
Pull ext3 removal, quota & udf fixes from Jan Kara:
"The biggest change in the pull is the removal of ext3 filesystem
driver (~28k lines removed). Ext4 driver is a full featured
replacement these days and both RH and SUSE use it for several years
without issues. Also there are some workarounds in VM & block layer
mainly for ext3 which we could eventually get rid of.
Other larger change is addition of proper error handling for
dquot_initialize(). The rest is small fixes and cleanups"
[ I wasn't convinced about the ext3 removal and worried about things
falling through the cracks for legacy users, but ext4 maintainers
piped up and were all unanimously in favor of removal, and maintaining
all legacy ext3 support inside ext4. - Linus ]
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Don't modify filesystem for read-only mounts
quota: remove an unneeded condition
ext4: memory leak on error in ext4_symlink()
mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
ext4: Improve ext4 Kconfig test
block: Remove forced page bouncing under IO
fs: Remove ext3 filesystem driver
doc: Update doc about journalling layer
jfs: Handle error from dquot_initialize()
reiserfs: Handle error from dquot_initialize()
ocfs2: Handle error from dquot_initialize()
ext4: Handle error from dquot_initialize()
ext2: Handle error from dquot_initalize()
quota: Propagate error from ->acquire_dquot()
including:
- Support for reproducible document builds, from Ben Hutchings and
company.
- The ability to automatically generate cross-reference links within a
single DocBook book and embedded descriptions for large structures.
From Danilo Cesar Lemes de Paula.
- A new document on how to add a system call from David Drysdale.
- Chameleon bus documentation from Johannes Thumshirn.
...plus the usual collection of improvements, typo fixes, and more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV4/B0AAoJEI3ONVYwIuV6Y1UQAIpU1TSPCuRcgPLgKhuEty9w
nMKA/Rn2Wye0608HbZ1FhpUXTS914kYOF0zA6f4xS1kOGoqqgSOfkP/bXfdZP67P
aH1onug5xFIfMUTGT2tAHabPymbGCZARVe/1YYKPTTh7hu4ZnPd7HULelp/KHa7Y
9yl/VQzggu4ASWXTKU89vZmoNSvUf+e73sCB+ZQ69QgY2JAnK9HaWDjOpWesev21
ZSZneWrvVWngupWAsw8Wy+QVbqEEIMd5+XC7hN+GEPZInuRGr5oAuyUgmO90JiER
WmFH5D4vRi3KG5XLmvHdUA5lhOqGM3cZC8W9Xw7byf86NVdWoN9rVs4IhBJjC7PC
v0foIuORKbuxqXJ/bn2pMXuWEq9EU80mvs+Hot7cMTP06syXY2/ROcSaReJ9TjcU
yw9uY8tOiR5Pq7tfqDylBbsJKlgYSatdsZKacLG5rCuUwLhsxlTaxG+yyg4zIfeb
EBvsRZBUTElzBELHxTwsOmUJf98QvUEuq8EHfGhUIR0IAnzzSwe2rccCtzohI2je
Sk0R3W9kKZdrgOr8vBehPEXxUYFDoWx+6MgpJhD7eSMhuL6510Gp5AV7Fj/+tTgE
8aTfubk2CGES5Aiwnyi8+Jf+LelPcWpKU4p3rsbsjnrbREV7YzxFHwtlCGtFYBhG
xWCMG47K4Wx8ynKyUE5m
=pQ1H
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"There's been a fair amount going on in the docs tree this time around,
including:
- Support for reproducible document builds, from Ben Hutchings and
company.
- The ability to automatically generate cross-reference links within
a single DocBook book and embedded descriptions for large
structures. From Danilo Cesar Lemes de Paula.
- A new document on how to add a system call from David Drysdale.
- Chameleon bus documentation from Johannes Thumshirn.
...plus the usual collection of improvements, typo fixes, and more"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (39 commits)
Documentation, add kernel-parameters.txt entry for dis_ucode_ldr
Documentation/x86: Rename IRQSTACKSIZE to IRQ_STACK_SIZE
Documentation/Intel-IOMMU.txt: Modify definition of DRHD
docs: update HOWTO for 3.x -> 4.x versioning
kernel-doc: ignore unneeded attribute information
scripts/kernel-doc: Adding cross-reference links to html documentation.
DocBook: Fix non-determinstic installation of duplicate man pages
Documentation: minor typo fix in mailbox.txt
Documentation: describe how to add a system call
doc: Add more workqueue functions to the documentation
ARM: keystone: add documentation for SoCs and EVMs
scripts/kernel-doc Allow struct arguments documentation in struct body
SubmittingPatches: remove stray quote character
Revert "DocBook: Avoid building man pages repeatedly and inconsistently"
Documentation: Minor changes to men-chameleon-bus.txt
Doc: fix trivial typo in SubmittingPatches
MAINTAINERS: Direct Documentation/DocBook/media properly
Documentation: installed man pages don't need to be executable
fix Evolution submenu name in email-clients.txt
Documentation: Add MCB documentation
...
Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer. This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Changed the documentation to allow sprintf() when the buffer
provided by sysfs cannot be overflowed. Explicitly say
snprintf() must never be used in a show function to format
data to be returned to user space.
Change based on a discussion about the patch
st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shane Seymour <shane.seymour@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch update the Documentation/filesystems/debugfs.txt
file. The main work is to add the description of the following
functions:
debugfs_create_atomic_t
debugfs_create_u32_array
debugfs_create_devm_seqfile
debugfs_create_file_size
Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The functionality of ext3 is fully supported by ext4 driver. Major
distributions (SUSE, RedHat) already use ext4 driver to handle ext3
filesystems for quite some time. There is some ugliness in mm resulting
from jbd cleaning buffers in a dirty page without cleaning page dirty
bit and also support for buffer bouncing in the block layer when stable
pages are required is there only because of jbd. So let's remove the
ext3 driver. This saves us some 28k lines of duplicated code.
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
This reverts commit 731d5cca82.
Commit ffe1f0df58 ("rpcrdma: Merge svcrdma and xprtrdma modules into
one") forgot to update the corresponding documentation.
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I am a high school student trying to become familiar with
Linux kernel development. The btrfs documentation in
Documentation/filesystems had a few typos and errors in
whitespace. This patch corrects both of these.
Signed-off-by: Daniel Grimshaw <grimshaw@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mateusz Guzik reported :
Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.
Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.
Mateusz provided an RFC patch and a micro benchmark :
http://people.redhat.com/~mguzik/pipebench.c
A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.
We can use RCU instead of the spinlock.
__fd_install() must wait if a resize is in progress.
The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())
resize should be attempted by a single thread to not waste resources.
rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.
It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This update contains:
o A new sparse on-disk inode record format to allow small extents to
be used for inode allocation when free space is fragmented.
o DAX support. This includes minor changes to the DAX core code to
fix problems with lock ordering and bufferhead mapping abuse.
o transaction commit interface cleanup
o removal of various unnecessary XFS specific type definitions
o cleanup and optimisation of freelist preparation before allocation
o various minor cleanups
o bug fixes for
- transaction reservation leaks
- incorrect inode logging in unwritten extent conversion
- mmap lock vs freeze ordering
- remote symlink mishandling
- attribute fork removal issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJVkhI0AAoJEK3oKUf0dfod45MQAJCOEkNduBdlfPvTCMPjj/7z
vzcfDdzgKwhpPTMXSDRvw4zDPt3C2FLMBJqxtPpC4sKGKG/8G0kFvw8bDtBag1m9
ru5nI5LaQ6LC5RcU40zxBx1s/L8qYvyfUlxeoOT5lSwN9c6ENGOCQ3bUk4pSKaee
pWDplag9LbfQomW2GHtxd8agMUZEYx0R1vgfv88V8xgPka8CvQo81XUgkb4PcDZV
ugR+wDUsvwMS01aLYBmRFkMXuExNuCJVwtvdTJS+ZWGHzyTpulFoANUW6QT24gAM
eP4yRXN4bv9vXrXpg8JkF25DHsfw4HBwNEL17ZvoB8t3oJp1/NYaH8ce1jS0+I8i
NCtaO+qUqDSTGQZKgmeDPwCciQp54ra9LEdmIJFxpZxiBof9g/tIYEFgRklyFLwR
GZU6Io6VpBa1oTGlC4D1cmG6bdcnhMB9MGVVCbqnB5mRRDKCmVgCyJwusd1pi7Re
G4O6KkFt21O7+fP13VsjP57KoaJzsIgZ/+H3Ff/fJOJ33AKYTRCmwi8+IMi2n5JI
zz+V0AIBQZAx9dlVyENnxufh9eJYcnwta0lUSLCCo91fZKxbo3ktK1kVHNZP5EGs
IMFM1Ka6hibY20rWlR3GH0dfyP5/yNcvNgTMYPKjj9SVjTar1aSfF2rGpkqYXYyH
D4FICbtDgtOc2ClfpI2k
=3x+W
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pul xfs updates from Dave Chinner:
"There's a couple of small API changes to the core DAX code which
required small changes to the ext2 and ext4 code bases, but otherwise
everything is within the XFS codebase.
This update contains:
- A new sparse on-disk inode record format to allow small extents to
be used for inode allocation when free space is fragmented.
- DAX support. This includes minor changes to the DAX core code to
fix problems with lock ordering and bufferhead mapping abuse.
- transaction commit interface cleanup
- removal of various unnecessary XFS specific type definitions
- cleanup and optimisation of freelist preparation before allocation
- various minor cleanups
- bug fixes for
- transaction reservation leaks
- incorrect inode logging in unwritten extent conversion
- mmap lock vs freeze ordering
- remote symlink mishandling
- attribute fork removal issues"
* tag 'xfs-for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
xfs: don't truncate attribute extents if no extents exist
xfs: clean up XFS_MIN_FREELIST macros
xfs: sanitise error handling in xfs_alloc_fix_freelist
xfs: factor out free space extent length check
xfs: xfs_alloc_fix_freelist() can use incore perag structures
xfs: remove xfs_caddr_t
xfs: use void pointers in log validation helpers
xfs: return a void pointer from xfs_buf_offset
xfs: remove inst_t
xfs: remove __psint_t and __psunsigned_t
xfs: fix remote symlinks on V5/CRC filesystems
xfs: fix xfs_log_done interface
xfs: saner xfs_trans_commit interface
xfs: remove the flags argument to xfs_trans_cancel
xfs: pass a boolean flag to xfs_trans_free_items
xfs: switch remaining xfs_trans_dup users to xfs_trans_roll
xfs: check min blks for random debug mode sparse allocations
xfs: fix sparse inodes 32-bit compile failure
xfs: add initial DAX support
xfs: add DAX IO path support
...
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"
* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
nfsd: wrap too long lines in nfsd4_encode_read
nfsd: fput rd_file from XDR encode context
nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
nfsd: refactor nfs4_preprocess_stateid_op
nfsd: clean up raparams handling
nfsd: use swap() in sort_pacl_range()
rpcrdma: Merge svcrdma and xprtrdma modules into one
svcrdma: Add a separate "max data segs macro for svcrdma
svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
svcrdma: Keep rpcrdma_msg fields in network byte-order
svcrdma: Fix byte-swapping in svc_rdma_sendto.c
nfsd: Update callback sequnce id only CB_SEQUENCE success
nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
SUNRPC: Move EXPORT_SYMBOL for svc_process
uapi/nfs: Add NFSv4.1 ACL definitions
nfsd: Remove dead declarations
nfsd: work around a gcc-5.1 warning
nfsd: Checking for acl support does not require fetching any acls
...
Pull UDF fixes and cleanups from Jan Kara:
"The contains some small fixes and improvements in error handling for
UDF.
Bundled is also one ext3 coding style fix and a fix in quota
documentation"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: fix udf_load_pvoldesc()
udf: remove double err declaration in udf_file_write_iter()
UDF: support NFSv2 export
fs: ext3: super: fixed a space coding style issue
quota: Update documentation
udf: Return error from udf_find_entry()
udf: Make udf_get_filename() return error instead of 0 length file name
udf: bug on exotic flag in udf_get_filename()
udf: improve error management in udf_CS0toNLS()
udf: improve error management in udf_CS0toUTF8()
udf: unicode: update function name in comments
udf: remove unnecessary test in udf_build_ustr_exact()
udf: Return -ENOMEM when allocation fails in udf_get_filename()
The main thing here is Ingo's big subdirectory documenting feature support
for each architecture. Beyond that, it's the usual pile of fixes, tweaks,
and small additions.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
TfkIhy6QQm1vAPTpRXaE
=ODt8
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"The main thing here is Ingo's big subdirectory documenting feature
support for each architecture. Beyond that, it's the usual pile of
fixes, tweaks, and small additions"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
doc:md: fix typo in md.txt.
Documentation/mic/mpssd: don't build x86 userspace when cross compiling
Documentation/prctl: don't build tsc tests when cross compiling
Documentation/vDSO: don't build tests when cross compiling
Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
Doc: Change wikipedia's URL from http to https
Documentation/kernel-parameters: add missing pciserial to the earlyprintk
Doc:pps: Fix typo in pps.txt
kbuild : Fix documentation of INSTALL_HDR_PATH
Documentation: filesystems: updated struct file_operations documentation in vfs.txt
kbuild: edit explanation of clean-files variable
Doc: ja_JP: Fix typo in HOWTO
Move freefall program from Documentation/ to tools/
Documentation: ARM: EXYNOS: Describe boot loaders interface
Doc:nfc: Fix typo in nfc-hci.txt
vfs: Minor documentation fix
Doc: networking: txtimestamp: fix printf format warning
Documentation, intel_pstate: Improve legacy mode internal governors description
Documentation: extend use case for EXPORT_SYMBOL_GPL()
...
Updated struct file_operations documentation in vfs.txt to match
current implementation
Signed-off-by: Thomas de Beauchene <chauvo_t@epitech.eu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The check_acl inode operation and the IPERM_FLAG_RCU flag are long gone; update
the documentation.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Linux v3.20 was released as v4.0.
Signed-off-by: Fanael Linithien <fanael4@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The guidelines for adding automount support to a filesystem
in filesystems/automount-support.txt is out or date.
filesystems/autofs4.txt contains more current text, so replace
the out-of-date content with a reference to that.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks. Stored pointer
is ignored in all cases except the last one.
Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().
b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is. In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit dc6c9a35b6 ("mm: account pmd page tables to the process")
add VmPMD in /proc/PID/status.
This patch add a description in proc.txt for it.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Change the kernel version in table 1-2 from 3.20 to 4.1
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Reviewed-by: Nathan Scott <nathans@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The 'overloads-avoided' counter itself was removed several years ago by
commit 78c210e (Revert "knfsd: avoid overloading the CPU scheduler with
enormous load averages").
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This update contains:
o RENAME_WHITEOUT support
o conversion of per-cpu superblock accounting to use generic counters
o new inode mmap lock so that we can lock page faults out of truncate, hole
punch and other direct extent manipulation functions to avoid racing mmap
writes from causing data corruption
o rework of direct IO submission and completion to solve data corruption issue
when running concurrent extending DIO writes. Also solves problem of running
IO completion transactions in interrupt context during size extending AIO
writes.
o FALLOC_FL_INSERT_RANGE support for inserting holes into a file via direct
extent manipulation to avoid needing to copy data within the file
o attribute block header field overflow fix for 64k block size filesystems
o Lots of changes to log messaging to be more informative and concise when
errors occur. Also prevent a lot of unnecessary log spamming due to cascading
failures in error conditions.
o lots of cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJVOE8oAAoJEK3oKUf0dfodx1kQAIIH8CwqcBrIslOntfHlFPHz
P9aQl5uiI6JcnFqMiHG6mfnjWGpn+Z6XMDGIBwrSTzHj8IEnHTeXqYiS6SDPAnrH
+VmlJEvW01ucAv7vcXKPrfutcc8dxLpy4fs63HOWmXh4rmrTcpel5S+0JSQxyGd6
OriLg1nfD4Sid7R9CFEXAKLghJFK+gbao2CmT0wo6ZrTwiZl2p62Y187ou+d+u3k
BRol99pI/Sp9bKpWZpUv3q2RnfD1v/k4oDP/JG4Ohdt2dx+nDqCjLvL8B5hJu74B
ZI+R+N28sAkMmbtR61kk06F7MS9RZqzBNIZalugaSuspKoenDZzmURZX+i77ogPQ
Ii3XLUMUzdwmi55/tBhpI7VkpFxahaEbWzTT1sMBh/Ka3GXO56BMIYTPvntjoN4w
ElcbFAMAZl8O56ruGBnc/k72CfFbq8qp93KkOfBGIKwwiPN+eCK8bQYL4G3sIZzx
f6k/WLbbShyViX9qoWLiX7qUfvh0NU/EcmGcJBsTmn0NFNOP4WmuojAq6SrvTgEz
No6zYJtnJvEPDa/v5A0dZyYfLqz2cTkEyTM9uwSixcCa1qAS+8IBcCGgTKfQOYkV
hCUWugiHwj4OQ/6WgP6oYLtIYdw6gqXgUKZy1Iy+ThDRwLbg9emYWixQTi4GAuRO
2SEBbFGSk7KIpoPENDUC
=WE6f
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs update from Dave Chinner:
"This update contains:
- RENAME_WHITEOUT support
- conversion of per-cpu superblock accounting to use generic counters
- new inode mmap lock so that we can lock page faults out of
truncate, hole punch and other direct extent manipulation functions
to avoid racing mmap writes from causing data corruption
- rework of direct IO submission and completion to solve data
corruption issue when running concurrent extending DIO writes.
Also solves problem of running IO completion transactions in
interrupt context during size extending AIO writes.
- FALLOC_FL_INSERT_RANGE support for inserting holes into a file via
direct extent manipulation to avoid needing to copy data within the
file
- attribute block header field overflow fix for 64k block size
filesystems
- Lots of changes to log messaging to be more informative and concise
when errors occur. Also prevent a lot of unnecessary log spamming
due to cascading failures in error conditions.
- lots of cleanups and bug fixes
One thing of note is the direct IO fixes that we merged last week
after the window opened. Even though a little late, they fix a user
reported data corruption and have been pretty well tested. I figured
there was not much point waiting another 2 weeks for -rc1 to be
released just so I could send them to you..."
* tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
xfs: using generic_file_direct_write() is unnecessary
xfs: direct IO EOF zeroing needs to drain AIO
xfs: DIO write completion size updates race
xfs: DIO writes within EOF don't need an ioend
xfs: handle DIO overwrite EOF update completion correctly
xfs: DIO needs an ioend for writes
xfs: move DIO mapping size calculation
xfs: factor DIO write mapping from get_blocks
xfs: unlock i_mutex in xfs_break_layouts
xfs: kill unnecessary firstused overflow check on attr3 leaf removal
xfs: use larger in-core attr firstused field and detect overflow
xfs: pass attr geometry to attr leaf header conversion functions
xfs: disallow ro->rw remount on norecovery mount
xfs: xfs_shift_file_space can be static
xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
xfs: Fix incorrect positive ENOMEM return
xfs: xfs_mru_cache_insert() should use GFP_NOFS
xfs: %pF is only for function pointers
xfs: fix shadow warning in xfs_da3_root_split()
...
Pull f2fs updates from Jaegeuk Kim:
"New features:
- in-memory extent_cache
- fs_shutdown to test power-off-recovery
- use inline_data to store symlink path
- show f2fs as a non-misc filesystem
Major fixes:
- avoid CPU stalls on sync_dirty_dir_inodes
- fix some power-off-recovery procedure
- fix handling of broken symlink correctly
- fix missing dot and dotdot made by sudden power cuts
- handle wrong data index during roll-forward recovery
- preallocate data blocks for direct_io
... and a bunch of minor bug fixes and cleanups"
* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
f2fs: pass checkpoint reason on roll-forward recovery
f2fs: avoid abnormal behavior on broken symlink
f2fs: flush symlink path to avoid broken symlink after POR
f2fs: change 0 to false for bool type
f2fs: do not recover wrong data index
f2fs: do not increase link count during recovery
f2fs: assign parent's i_mode for empty dir
f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
f2fs: fix mismatching lock and unlock pages for roll-forward recovery
f2fs: fix sparse warnings
f2fs: limit b_size of mapped bh in f2fs_map_bh
f2fs: persist system.advise into on-disk inode
f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
f2fs: preallocate fallocated blocks for direct IO
f2fs: enable inline data by default
f2fs: preserve extent info for extent cache
f2fs: initialize extent tree with on-disk extent info of inode
f2fs: introduce __{find,grab}_extent_tree
f2fs: split set_data_blkaddr from f2fs_update_extent_cache
f2fs: enable fast symlink by utilizing inline data
...
of the i2o docs, some new Chinese translations, and, hopefully, the README
fix that will end the flow of identical patches to that file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVMOeZAAoJEI3ONVYwIuV6/1YQAJwcVvd3ow6cMKuf8eRKMgjd
crJUdRF9FTwyY21SRHBaonyiKOthnVedOUYnFQ5Z7jbII0EohJ//72nW0pQrHoGi
0avkbM+ZAZzXfd/paOiZ5HtYkc8Bdc70mPU1fzfexnPm/JACOGznxQsob05r6/sT
W1GyJcrLlp4uPrba9rhAdGtaa+mEFrt4SVCI+odXOnxQ/KOZSIu1n1F3bSSvL9zV
YMBgRrCso+Cdtuhe4N3O1jsVy/hyOnvtqcUgwlD4VzElsshKvxdxHn47yeeWK1qI
zZfGTv+q5QI/eZgIhBIrBpdCafgLipAmNZhX+M76xeNydhfhp60VizRgKfb6JiW1
M+nLSnv2vxh4wgkJs7yWgW+8TJ0eCF/w2P/mBi6hDXuIul9gwgmNk+EZ7LONSh+2
l3PV/dyswNs04qbYgFt2rygsxcg79RRbD54zi+S/3NU38/gh7nlidASKmNu1boG/
KdZx3F0rmX/xQu4aQ5nIQl2N7sVLkEec+oN+ukQGyBTLVHkfAK06Z4EWUeYmmBKh
M6gqRY5XJMtCm8D5bons/yZmwmpdZFZMxFGJ4enUwrfsJ8FQ8qy/KmFqF8SojGWQ
HYs4ZUptz6SYa7K0Txe/q0pkrW+doy7t/Bz+JBNNdG7eLeHIpKhSqSlLIdB7MRKw
NFA8a4PdgdMpf+Zr4bRy
=LDbk
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"Numerous fixes, the overdue removal of the i2o docs, some new Chinese
translations, and, hopefully, the README fix that will end the flow of
identical patches to that file"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
Documentation/memcg: update memcg/kmem status
Documentation: blackfin: Makefile: Typo building issue
Documentation/vm/pagemap.txt: correct location of page-types tool
Documentation/memory-barriers.txt: typo fix
doc: Add guest_nice column to example output of `cat /proc/stat'
Documentation/kernel-parameters: Move "eagerfpu" to its right place
Documentation: gpio: Update ACPI part of the document to mention _DSD
docs/completion.txt: Various tweaks and corrections
doc: completion: context, scope and language fixes
Documentation:Update Documentation/zh_CN/arm64/memory.txt
Documentation:Update Documentation/zh_CN/arm64/booting.txt
Documentation: Chinese translation of arm64/legacy_instructions.txt
DocBook media: fix broken EIA hyperlink
Documentation: tweak the maintainers entry
README: Change gzip/bzip2 to xz compression format
README: Update version number reference
doc:pci: Fix typo in Documentation/PCI
Documentation: drm: Use '->' when describing access through pointers.
Documentation: Remove mentioning of block barriers
Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
...
Merge third patchbomb from Andrew Morton:
- various misc things
- a couple of lib/ optimisations
- provide DIV_ROUND_CLOSEST_ULL()
- checkpatch updates
- rtc tree
- befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs
- ptrace fixes
- fork() fixes
- seccomp cleanups
- more mmap_sem hold time reductions from Davidlohr
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits)
proc: show locks in /proc/pid/fdinfo/X
docs: add missing and new /proc/PID/status file entries, fix typos
drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
Documentation/spi/spidev_test.c: fix warning
drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
.gitignore: ignore *.tar
MAINTAINERS: add Mediatek SoC mailing list
tomoyo: reduce mmap_sem hold for mm->exe_file
powerpc/oprofile: reduce mmap_sem hold for exe_file
oprofile: reduce mmap_sem hold for mm->exe_file
mips: ip32: add platform data hooks to use DS1685 driver
lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
x86: switch to using asm-generic for seccomp.h
sparc: switch to using asm-generic for seccomp.h
powerpc: switch to using asm-generic for seccomp.h
parisc: switch to using asm-generic for seccomp.h
mips: switch to using asm-generic for seccomp.h
microblaze: use asm-generic for seccomp.h
arm: use asm-generic for seccomp.h
seccomp: allow COMPAT sigreturn overrides
...
Let's show locks which are associated with a file descriptor in
its fdinfo file.
Currently we don't have a reliable way to determine who holds a lock. We
can find some information in /proc/locks, but PID which is reported there
can be wrong. For example, a process takes a lock, then forks a child and
dies. In this case /proc/locks contains the parent pid, which can be
reused by another process.
$ cat /proc/locks
...
6: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF
...
$ ps -C rpcbind
PID TTY TIME CMD
332 ? 00:00:00 rpcbind
$ cat /proc/332/fdinfo/4
pos: 0
flags: 0100000
mnt_id: 22
lock: 1: FLOCK ADVISORY WRITE 324 00:13:13431 0 EOF
$ ls -l /proc/332/fd/4
lr-x------ 1 root root 64 Mar 5 14:43 /proc/332/fd/4 -> /run/rpcbind.lock
$ ls -l /proc/324/fd/
total 0
lrwx------ 1 root root 64 Feb 27 14:50 0 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:50 1 -> /dev/pts/0
lrwx------ 1 root root 64 Feb 27 14:49 2 -> /dev/pts/0
You can see that the process with the 324 pid doesn't hold the lock.
This information is required for proper dumping and restoring file
locks.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull third hunk of vfs changes from Al Viro:
"This contains the ->direct_IO() changes from Omar + saner
generic_write_checks() + dealing with fcntl()/{read,write}() races
(mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
repeatedly looking at ->f_flags, which can be changed by fcntl(2),
check ->ki_flags - which cannot) + infrastructure bits for dhowells'
d_inode annotations + Christophs switch of /dev/loop to
vfs_iter_write()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
block: loop: switch to VFS ITER_BVEC
configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
NFS: Don't use d_inode as a variable name
VFS: Impose ordering on accesses of d_inode and d_flags
VFS: Add owner-filesystem positive/negative dentry checks
nfs: generic_write_checks() shouldn't be done on swapout...
ocfs2: use __generic_file_write_iter()
mirror O_APPEND and O_DIRECT into iocb->ki_flags
switch generic_write_checks() to iocb and iter
ocfs2: move generic_write_checks() before the alignment checks
ocfs2_file_write_iter: stop messing with ppos
udf_file_write_iter: reorder and simplify
fuse: ->direct_IO() doesn't need generic_write_checks()
ext4_file_write_iter: move generic_write_checks() up
xfs_file_aio_write_checks: switch to iocb/iov_iter
generic_write_checks(): drop isblk argument
blkdev_write_iter: expand generic_file_checks() call in there
...
Merge second patchbomb from Andrew Morton:
- the rest of MM
- various misc bits
- add ability to run /sbin/reboot at reboot time
- printk/vsprintf changes
- fiddle with seq_printf() return value
* akpm: (114 commits)
parisc: remove use of seq_printf return value
lru_cache: remove use of seq_printf return value
tracing: remove use of seq_printf return value
cgroup: remove use of seq_printf return value
proc: remove use of seq_printf return value
s390: remove use of seq_printf return value
cris fasttimer: remove use of seq_printf return value
cris: remove use of seq_printf return value
openrisc: remove use of seq_printf return value
ARM: plat-pxa: remove use of seq_printf return value
nios2: cpuinfo: remove use of seq_printf return value
microblaze: mb: remove use of seq_printf return value
ipc: remove use of seq_printf return value
rtc: remove use of seq_printf return value
power: wakeup: remove use of seq_printf return value
x86: mtrr: if: remove use of seq_printf return value
linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
MAINTAINERS: CREDITS: remove Stefano Brivio from B43
.mailmap: add Ricardo Ribalda
CREDITS: add Ricardo Ribalda Delgado
...
This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
get notified when access is a write to a read-only PFN.
This can happen if we mmap() a file then first mmap-read from it to
page-in a read-only PFN, than we mmap-write to the same page.
We need this functionality to fix a DAX bug, where in the scenario above
we fail to set ctime/mtime though we modified the file. An xfstest is
attached to this patchset that shows the failure and the fix. (A DAX
patch will follow)
This functionality is extra important for us, because upon dirtying of a
pmem page we also want to RDMA the page to a remote cluster node.
We define a new pfn_mkwrite and do not reuse page_mkwrite because
1 - The name ;-)
2 - But mainly because it would take a very long and tedious
audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
users. To make sure they do not now CRASH. For example current
DAX code (which this is for) would crash.
If we would want to reuse page_mkwrite, We will need to first
patch all users, so to not-crash-on-no-page. Then enable this
patch. But even if I did that I would not sleep so well at night.
Adding a new vector is the safest thing to do, and is not that
expensive. an extra pointer at a static function vector per driver.
Also the new vector is better for performance, because else we
Will call all current Kernel vectors, so to:
check-ha-no-page-do-nothing and return.
No need to call it from do_shared_fault because do_wp_page is called to
change pte permissions anyway.
Signed-off-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ifconfig command has been deprecated for many years.
To encourage new users not to continue using it and learning
iproute2; the ifconfig should not be used in examples.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Enable inline_data feature by default since it brings us better
performance and space utilization and now has already stable.
Add another option noinline_data to disable it during mount.
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Suggested-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit ce0e7b28fb ("sched, cpuacct: Fix niced guest time
accounting") added the guest_nice column to /proc/stat, but the example
output of `cat /proc/stat' in Documentation/filesystems/proc.txt wasn't
updated accordingly. Do so now.
Cc: Ryota Ozaki <ozaki.ryota@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Count and display through /proc/fs/fscache/stats the number of initialised
operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
This patch adds a mount option 'extent_cache' in f2fs.
It is try to use a rb-tree based extent cache to cache more mapping information
with less memory if this option is set, otherwise we will use the original one
extent info cache.
Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We (the Ocfs2 project) recently moved the location of our ocfs2-tools
git tree and project web page. The pertinent discussion can be seen
here:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-February/010579.html
The following patch updates the Ocfs2 documentation in MAINTAINERS,
ocfs2.txt, and dlmfs.txt. I added our new official web page, changed
the location of our tools git tree and removed the link to Joel's
ancient kernel git tree - Andrew has handled our patches for a while
now.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Count the number of objects that get culled by the cache backend and the
number of objects that the cache backend declines to instantiate due to lack
of space in the cache.
These numbers are made available through /proc/fs/fscache/stats
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
We recently removed deprecated sysctls; may as well
remove deprecated mount options as well, we've stated
that they'd be gone by now in the docs.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The DAX code accesses the underlying storage through the kernel's linear
mapping, which may not be cache-coherent with user mappings on ARM, MIPS
or SPARC. Temporarily disable the DAX code until this problem is
resolved.
The original XIP code also had this problem, but it was never noticed.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a port of the DAX functionality found in the current version of
ext2.
[matthew.r.wilcox@intel.com: heavily tweaked]
[akpm@linux-foundation.org: remap_pages went away]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new function allows us to support hole-punch for DAX files by zeroing
a partial page, as opposed to the dax_truncate_page() function which can
only truncate to the end of the page. Reimplement dax_truncate_page() to
call dax_zero_page_range().
[ross.zwisler@linux.intel.com: ported to 3.13-rc2]
[akpm@linux-foundation.org: fix typos in comments]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To help people transition, accept the 'xip' mount option (and report it in
/proc/mounts), but print a message encouraging people to switch over to
the 'dax' option.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All callers of get_xip_mem() are now gone. Remove checks for it,
initialisers of it, documentation of it and the only implementation of it.
Also remove mm/filemap_xip.c as it is now empty. Also remove
documentation of the long-gone get_xip_page().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on the original XIP documentation, this documents the current state
of affairs, and includes instructions on how users can enable DAX if their
devices and kernel support it.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull f2fs updates from Jaegeuk Kim:
"Major changes are to:
- add f2fs_io_tracer and F2FS_IOC_GETVERSION
- fix wrong acl assignment from parent
- fix accessing wrong data blocks
- fix wrong condition check for f2fs_sync_fs
- align start block address for direct_io
- add and refactor the readahead flows of FS metadata
- refactor atomic and volatile write policies
But most of patches are for clean-ups and minor bug fixes. Some of
them refactor old code too"
* tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
f2fs: use spinlock for segmap_lock instead of rwlock
f2fs: fix accessing wrong indexed data blocks
f2fs: avoid variable length array
f2fs: fix sparse warnings
f2fs: allocate data blocks in advance for f2fs_direct_IO
f2fs: introduce macros to convert bytes and blocks in f2fs
f2fs: call set_buffer_new for get_block
f2fs: check node page contents all the time
f2fs: avoid data offset overflow when lseeking huge file
f2fs: fix to use highmem for pages of newly created directory
f2fs: introduce a batched trim
f2fs: merge {invalidate,release}page for meta/node/data pages
f2fs: show the number of writeback pages in stat
f2fs: keep PagePrivate during releasepage
f2fs: should fail mount when trying to recover data on read-only dev
f2fs: split UMOUNT and FASTBOOT flags
f2fs: avoid write_checkpoint if f2fs is mounted readonly
f2fs: support norecovery mount option
f2fs: fix not to drop mount options when retrying fill_super
f2fs: merge flags in struct f2fs_sb_info
...
Merge third set of updates from Andrew Morton:
- the rest of MM
[ This includes getting rid of the numa hinting bits, in favor of
just generic protnone logic. Yay. - Linus ]
- core kernel
- procfs
- some of lib/ (lots of lib/ material this time)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits)
lib/lcm.c: replace include
lib/percpu_ida.c: remove redundant includes
lib/strncpy_from_user.c: replace module.h include
lib/stmp_device.c: replace module.h include
lib/sort.c: move include inside #if 0
lib/show_mem.c: remove redundant include
lib/radix-tree.c: change to simpler include
lib/plist.c: remove redundant include
lib/nlattr.c: remove redundant include
lib/kobject_uevent.c: remove redundant include
lib/llist.c: remove redundant include
lib/md5.c: simplify include
lib/list_sort.c: rearrange includes
lib/genalloc.c: remove redundant include
lib/idr.c: remove redundant include
lib/halfmd4.c: simplify includes
lib/dynamic_queue_limits.c: simplify includes
lib/sort.c: use simpler includes
lib/interval_tree.c: simplify includes
hexdump: make it return number of bytes placed in buffer
...
The output of /proc/$pid/numa_maps is in terms of number of pages like
anon=22 or dirty=54. Here's some output:
7f4680000000 default file=/hugetlb/bigfile anon=50 dirty=50 N0=50
7f7659600000 default file=/anon_hugepage\040(deleted) anon=50 dirty=50 N0=50
7fff8d425000 default stack anon=50 dirty=50 N0=50
Looks like we have a stack and a couple of anonymous hugetlbfs
areas page which both use the same amount of memory. They don't.
The 'bigfile' uses 1GB pages and takes up ~50GB of space. The
anon_hugepage uses 2MB pages and takes up ~100MB of space while the stack
uses normal 4k pages. You can go over to smaps to figure out what the
page size _really_ is with KernelPageSize or MMUPageSize. But, I think
this is a pretty nasty and counterintuitive interface as it stands.
This patch introduces 'kernelpagesize_kB' line element to
/proc/<pid>/numa_maps report file in order to help identifying the size of
pages that are backing memory areas mapped by a given task. This is
specially useful to help differentiating between HUGE and GIGANTIC page
backed VMAs.
This patch is based on Dave Hansen's proposal and reviewer's follow-ups
taken from the following dicussion threads:
* https://lkml.org/lkml/2011/9/21/454
* https://lkml.org/lkml/2014/12/20/66
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a small section to proc.txt doc in order to document its
/proc/pid/numa_maps interface. It does not introduce any functional
changes, just documentation.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peak resident size of a process can be reset back to the process's
current rss value by writing "5" to /proc/pid/clear_refs. The driving
use-case for this would be getting the peak RSS value, which can be
retrieved from the VmHWM field in /proc/pid/status, per benchmark
iteration or test scenario.
[akpm@linux-foundation.org: clarify behaviour in documentation]
Signed-off-by: Petr Cermak <petrcermak@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Primiano Tucci <primiano@chromium.org>
Cc: Petr Cermak <petrcermak@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core block IO changes from Jens Axboe:
"This contains:
- A series from Christoph that cleans up and refactors various parts
of the REQ_BLOCK_PC handling. Contributions in that series from
Dongsu Park and Kent Overstreet as well.
- CFQ:
- A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
- A stable patch fixing a potential crash in CFQ in OOM
situations. From Konstantin Khlebnikov.
- blk-mq:
- Add support for tag allocation policies, from Shaohua. This is
a prep patch enabling libata (and other SCSI parts) to use the
blk-mq tagging, instead of rolling their own.
- Various little tweaks from Keith and Mike, in preparation for
DM blk-mq support.
- Minor little fixes or tweaks from me.
- A double free error fix from Tony Battersby.
- The partition 4k issue fixes from Matthew and Boaz.
- Add support for zero+unprovision for blkdev_issue_zeroout() from
Martin"
* 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
block: remove unused function blk_bio_map_sg
block: handle the null_mapped flag correctly in blk_rq_map_user_iov
blk-mq: fix double-free in error path
block: prevent request-to-request merging with gaps if not allowed
blk-mq: make blk_mq_run_queues() static
dm: fix multipath regression due to initializing wrong request
cfq-iosched: handle failure of cfq group allocation
block: Quiesce zeroout wrapper
block: rewrite and split __bio_copy_iov()
block: merge __bio_map_user_iov into bio_map_user_iov
block: merge __bio_map_kern into bio_map_kern
block: pass iov_iter to the BLOCK_PC mapping functions
block: add a helper to free bio bounce buffer pages
block: use blk_rq_map_user_iov to implement blk_rq_map_user
block: simplify bio_map_kern
block: mark blk-mq devices as stackable
block: keep established cmd_flags when cloning into a blk-mq request
block: add blk-mq support to blk_insert_cloned_request()
block: require blk_rq_prep_clone() be given an initialized clone request
blk-mq: add tag allocation policy
...
Pull nfsd updates from Bruce Fields:
"The main change is the pNFS block server support from Christoph, which
allows an NFS client connected to shared disk to do block IO to the
shared disk in place of NFS reads and writes. This also requires xfs
patches, which should arrive soon through the xfs tree, barring
unexpected problems. Support for other filesystems is also possible
if there's interest.
Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
shape"
* 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: default NFSv4.2 to on
nfsd: pNFS block layout driver
exportfs: add methods for block layout exports
nfsd: add trace events
nfsd: update documentation for pNFS support
nfsd: implement pNFS layout recalls
nfsd: implement pNFS operations
nfsd: make find_any_file available outside nfs4state.c
nfsd: make find/get/put file available outside nfs4state.c
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
nfsd: add fh_fsid_match helper
nfsd: move nfsd_fh_match to nfsfh.h
fs: add FL_LAYOUT lease type
fs: track fl_owner for leases
nfs: add LAYOUT_TYPE_MAX enum value
nfsd: factor out a helper to decode nfstime4 values
sunrpc/lockd: fix references to the BKL
nfsd: fix year-2038 nfs4 state problem
svcrdma: Handle additional inline content
svcrdma: Move read list XDR round-up logic
...
Merge second set of updates from Andrew Morton:
"More of MM"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
vmstat: Reduce time interval to stat update on idle cpu
mm/page_owner.c: remove unnecessary stack_trace field
Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
mm: incorporate read-only pages into transparent huge pages
vmstat: do not use deferrable delayed work for vmstat_update
mm: more aggressive page stealing for UNMOVABLE allocations
mm: always steal split buddies in fallback allocations
mm: when stealing freepages, also take pages created by splitting buddy page
mincore: apply page table walker on do_mincore()
mm: /proc/pid/clear_refs: avoid split_huge_page()
mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
mempolicy: apply page table walker on queue_pages_range()
arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
memcg: cleanup preparation for page table walk
numa_maps: remove numa_maps->vma
numa_maps: fix typo in gather_hugetbl_stats
pagemap: use walk->vma instead of calling find_vma()
clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
...
Highlights incluse:
Features:
- Removing the forced serialisation of open()/close() calls in NFSv4.x (x>0)
makes for a significant performance improvement in metadata intensive
workloads.
- Full support for the pNFS "flexible files" layout type
- Further RPC/RDMA client improvements from Chuck
Bugfixes:
- Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
- Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
- Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called as part
of the namespace cleanup,
- Stable fix: Ensure we reference the inode for return-on-close in delegreturn
- Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to the
same source address/port combination during a disconnect/reconnect event.
This is a requirement imposed by most NFSv3 server duplicate reply cache
implementations.
Optimisations:
- Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT
Other:
- Add Anna Schumaker as co-maintainer for the NFS client
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2swgAAoJEGcL54qWCgDyCWoP/1bxN8PesqaiwsBm3fsEqcra
WZtMirDIpJYpHwgysdv9t5otBQrb7GrLlNyGZ9NBOVNakifoyj2tHe+/XGDx7Qny
iYxXam0QdyjLU+bi4QoG4bdFncwQ/NmC6fqoG0rc25Il96Oggnc6LeSwL6Koc3CD
QitRLLi/PaU5qtuaV80+tYMJiqZbpBdVjB+xfSpu7rhyWzm1QNdEeQYor5CozzMi
6cRJuvHgjoZ1xriCWdxQHjqOiEaKNLwfm3uZ3XVaaUAIDhStXugdhIihj3J6Wi7k
MKNuY+AKJiy3yOdFfhYALyq+TPundDbYoM9x1foigjgP8zxXVfIU3VS6l33TSlzX
zH+/lcnXAHFWjFYoAijG1gv1H+OYcTuDlKaYAShQ/cOkTfWFrmlWv+pZs3SSkmPY
4Aeu97YYOkB5ZZ7wTWKksQMeAu/LYNRSA3h+ANvEIR+NLlTSQTcaChlvBmS0IY5D
qMmko1Xgmsxv+B8UeIY7PLfGBGrUdFho1JiDTfL8Xk7fGOfM7iBtMeaMAfdyOSUq
AMqH9EDUUOWaFDggO2iisLtMCY6kJ0iFGKRTwzR38jAqm3bjWaIDitUqshNrNbC+
mbwvAVxn0IFSCJGFsVd3kD2rTLGDElZ25GLFW9JMalarE6nlLG7e4p65g209Q9bT
HYKiyinJJM2Zji07kmG/
=c47U
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights incluse:
Features:
- Removing the forced serialisation of open()/close() calls in
NFSv4.x (x>0) makes for a significant performance improvement in
metadata intensive workloads.
- Full support for the pNFS "flexible files" layout type
- Further RPC/RDMA client improvements from Chuck
Bugfixes:
- Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
- Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
- Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
as part of the namespace cleanup,
- Stable fix: Ensure we reference the inode for return-on-close in
delegreturn
- Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
the same source address/port combination during a disconnect/
reconnect event. This is a requirement imposed by most NFSv3
server duplicate reply cache implementations.
Optimisations:
- Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT
Other:
- Add Anna Schumaker as co-maintainer for the NFS client"
* tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
SUNRPC: Cleanup to remove xs_tcp_close()
pnfs: delete an unintended goto
pnfs/flexfiles: Do not dprintk after the free
SUNRPC: Fix stupid typo in xs_sock_set_reuseport
SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
SUNRPC: Handle connection reset more efficiently.
SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
SUNRPC: Remove TCP socket linger code
SUNRPC: Remove TCP client connection reset hack
SUNRPC: TCP/UDP always close the old socket before reconnecting
SUNRPC: Add helpers to prevent socket create from racing
SUNRPC: Ensure xs_reset_transport() resets the close connection flags
SUNRPC: Do not clear the source port in xs_reset_transport
SUNRPC: Handle EADDRINUSE on connect
SUNRPC: Set SO_REUSEPORT socket option for TCP connections
NFSv4.1: Fix pnfs_put_lseg races
NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
...
This patch introduces a batched trimming feature, which submits split discard
commands.
This is to avoid long latency due to huge trim commands.
If fstrim was triggered ranging from 0 to the end of device, we should lock
all the checkpoint-related mutexes, resulting in very long latency.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a mount option, norecovery, which is mostly same as
disable_roll_forward. The only difference is that norecovery should be activated
with read-only mount option.
This can be used when user wants to check whether f2fs is mountable or not
without any recovery process. (e.g., xfstests/200)
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send everything
to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build fixes,
and more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2QvPAAoJEI3ONVYwIuV6UdAP/iFa3yK0jTMFJV49K2PhVPiW
9AxlcDT3mKCmGrCxS2ST84Kyvxi+bn5k6pQbOhXqPTLRzlWj7o+9zG76yCvhI1/0
mSUc/DoEZSlxQCAi4RH6lNFBtyopwOF1Fy2hqP8Wj7fOu2OxJ+DbTFJ7Cjy2Ybnq
REFT+28mD3GwBYv1r6mirbXsxBGUWK4avqUl9TPkC1AgVoksZ/aLrDffdmixL6Ul
cNBGGn3/mQhDCd6QgrHfdSe3BmL/vpiO5nr7BnSGe/hg65tUgi5d5s9tyP1eRkYo
d+3Ityz24ISCW6kAGH9udNzdtw2QA7vxdVIcz01RgxHQcYdcTyE7ZH28+A+aEtxm
ANkOO5PvaQJrCdPATOuVvwTPmK+xjdW7TmmPiJZ3QpVuPkFlUDP+amOgOajBDQ2l
UDHE2GfrUcjUG4FSUGXLVKXVAXuqLG8DG1rMSEf1utb80jxcuhK2kIrhjfRi4gle
gL3PKd7SfhMowm3QbaMxdiy0RpNK+IlJpiFsDFWUJwQCJvCtxlwL/RalfGxitjqs
RxJo+uPxFLrmsnM8fdw5a/82R0T/nclvnzniq5PoZROOJ6VzKvwn3oS9d63bliPI
JQTfNpbfYq8sRlrPlp+XETrrcbmmYyxgqvurLobNXb74cdJHEzHjqf9OG++NKtQQ
Jre067cSnkSrHAHtxNns
=r07E
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send
everything to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build
fixes, and more"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
can-doc: Fixed a wrong filepath in can.txt
Documentation: Fix trivial typo in comment.
kgdb,docs: Fix typo and minor style issues
Documentation: add description for FTRACE probe status
doc: brief user documentation for completion
Documentation/misc-devices/mei: Fix indentation of embedded code.
Documentation/misc-devices/mei: Fix indentation of enumeration.
Documentation/misc-devices/mei: Fix spacing around parentheses.
Documentation/misc-devices/mei: Fix formatting of headings.
Documentation: devicetree: Fix double words in Doumentation/devicetree
Documentation: mm: Fix typo in vm.txt
lockstat: Add documentation on contention and contenting points
Documentation: fix blackfin gptimers-example build errors
Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
CodingStyle: enable emacs display of trailing whitespace
DocBook: Do not exceed argument list limit
gpio: board.txt: Fix the gpio name example
Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
MAINTAINERS: Add the docs-next git tree to the maintainer entry
...
Merge misc updates from Andrew Morton:
"Bite-sized chunks this time, to avoid the MTA ratelimiting woes.
- fs/notify updates
- ocfs2
- some of MM"
That laconic "some MM" is mainly the removal of remap_file_pages(),
which is a big simplification of the VM, and which gets rid of a *lot*
of random cruft and special cases because we no longer support the
non-linear mappings that it used.
From a user interface perspective, nothing has changed, because the
remap_file_pages() syscall still exists, it's just done by emulating the
old behavior by creating a lot of individual small mappings instead of
one non-linear one.
The emulation is slower than the old "native" non-linear mappings, but
nobody really uses or cares about remap_file_pages(), and simplifying
the VM is a big advantage.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
memcg: zap memcg_slab_caches and memcg_slab_mutex
memcg: zap memcg_name argument of memcg_create_kmem_cache
memcg: zap __memcg_{charge,uncharge}_slab
mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
mm: hugetlb: fix type of hugetlb_treat_as_movable variable
mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
mm: memory: merge shared-writable dirtying branches in do_wp_page()
mm: memory: remove ->vm_file check on shared writable vmas
xtensa: drop _PAGE_FILE and pte_file()-related helpers
x86: drop _PAGE_FILE and pte_file()-related helpers
unicore32: drop pte_file()-related helpers
um: drop _PAGE_FILE and pte_file()-related helpers
tile: drop pte_file()-related helpers
sparc: drop pte_file()-related helpers
sh: drop _PAGE_FILE and pte_file()-related helpers
score: drop _PAGE_FILE and pte_file()-related helpers
s390: drop pte_file()-related helpers
parisc: drop _PAGE_FILE and pte_file()-related helpers
openrisc: drop _PAGE_FILE and pte_file()-related helpers
nios2: drop _PAGE_FILE and pte_file()-related helpers
...
__generic_block_fiemap may spin very long time for large sparse files.
Without this patch an unprivileged user may abuse system resources simply
by spawning a vast number of unkilable busyloops (works on ext2/ext3):
truncate --size 1T test
for ((i=0;i<1024;i++))
do
filefrag test > /dev/null &
done
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a mount option to support JBD2 feature:
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened, journal
commit block can be written to disk without waiting for descriptor blocks,
which can improve journal commit performance. This option will enable
'journal_checksum' internally.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement, the files per second go up from 215.2 to 317.5.
test script:
fs_mark -d /mnt/ocfs2/ -s 10240 -n 1000
default:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 215.2 17878
with journal_async_commit option:
FSUse% Count Size Files/sec App Overhead
0 1000 10240 317.5 17881
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Weiwei Wang <wangww631@huawei.comm>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inotify interface has changed a lot. The user interface was too
old, and the kernel interface was removed by Eric Paris in commit:
2dfc1ca inotify: remove inotify in kernel interface.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Kai <morgan.wang@huawei.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Robert Love <robert.w.love@intel.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This bit of the docs didn't quite reflect reality.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Add a small shim between core nfsd and filesystems to translate the
somewhat cumbersome pNFS data structures and semantics to something
more palatable for Linux filesystems.
Thanks to Rick McNeal for the old prototype pNFS blocklayout server
code, which gave a lot of inspiration to this version even if no
code is left from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.
Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking the length requested
is positive, checking for the sector being page-aligned, and checking
the length of the request does not pass the end of the partition.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
xfsbufd_centisecs and age_buffer_centisecs were due for removal in
3.14. We forgot to do that - it's now well past time to remove these
deprecated, unused sysctls.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Update descriptions of seq_path() and seq_path_root():
starting with commit v3.2-rc4-1-g02125a8, seq_path_root() no longer
changes the value of root;
starting with commit v3.2-rc7-104-g8c9379e, some arguments of seq_path()
and seq_path_root() are const.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
LZ4 is a lightweight compression algorithm which can be used
on embedded systems to reduce CPU and memory overhead (in comparison
to the standard zlib compression).
These patches add the wrapper code to allow Squashfs to use
the existing LZ4 decompression code, and the necessary configuration
option.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJUjgIJAAoJEJAch/D1fbHUArwP/iDiDSpqxdfTQHwUKxB57skO
0iBzg6bXPlqwmsnllegg0SV0vOvFjpWiXWpOOtAVCJlfbov8DgUndsigpvG3UhD/
qJpNXAJ+xSdOzBqj3bS7SOu2DwPY8Gz4rxGQcNN3PsuOVR/EUgAnNlv22ZHY10A5
XQVyPbkwZ73TrZ2uKA8leWArFtCbM4oYGpxP+ramEox8nVFEOtixn5IcX5WkbGEL
Yt0NRw8K8vDIIETWVariugUFE4C1olFk+YmqqAw7cmDGJ70cEg5jh9ocNkwDIZPj
I9BNtkggBRMaCPwGsH6IvahMFUyWLQUgGayfY/fgbRiB9ZuYIQ1lyPDhzbgWczoE
o34eXAIDdmfPrmYlDEBDkYnXXtwuYqdOVYOtcEnyFEYqpHfaeS2h2s9nTiM+rz21
v0UEaDRmPtlkK/ZdLKUsrOf+8y9ejkT0R67swFaguHshL6EHey7X5ghmOuwCoL9x
fzGWtPFR+Nbqga5T3dwf+apvyUVrPaw6gZu36NNim2779ZgpnIPzW6MEYUMhtXCn
ef2+NvS9AeGyo7kiqlNQrihQWZSN0W/AiVsEeulzk5h+adzSNQ5eipzAO9DAAp16
8muY4nq51bOGVaWzqJz/KacCmt7i0qUdmS1p4l2uqPp9gH/s/S91yrYn/iszf3AV
CpwU2i9g3nQu9ecDc1Os
=mr7J
-----END PGP SIGNATURE-----
Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
Pull squashfs update from Phillip Lougher:
"These patches optionally add LZ4 compression support to Squashfs.
LZ4 is a lightweight compression algorithm which can be used on
embedded systems to reduce CPU and memory overhead (in comparison to
the standard zlib compression).
These patches add the wrapper code to allow Squashfs to use the
existing LZ4 decompression code, and the necessary configuration
option"
* tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
Squashfs: Add LZ4 compression configuration option
Squashfs: add LZ4 compression support
Allow "lowerdir=" option to contain multiple lower directories separated by
a colon (e.g. "lowerdir=/bin:/usr/bin"). Colon characters in filenames can
be escaped with a backslash.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUiaPRAAoJEI3ONVYwIuV68qQP/2j93CDahLEXlpbimdHCOyKO
xLKYwSIOxpzCTv8mQouNH4czVDheSSrq+xDUafioaMRlM80fb1EdhC9Nhr/xT/O+
fCsgfLETlpxAS4j6gjn60H8QYdnBgcz0p2hOgvysf8WHZ0PHzSnfE78BGS1Mnugs
gjF97y5pa3p+uTNNxHA7aB+Rg3/JbsuDXH0CtdzjpsS0bM/MBnGV0AQLLZ3qdzvf
6bSwb8LyV54cRjYrs0SZT0OWrPXlUc0AxjrQjq2IcyL+s5Uwu/Eb1RoZ69C14ohC
h5A1C87v4asqrzsC0VCqG48wi36+2k7zfVT7wSkHfERXWGtjpd7Uy3lccMMPLg/H
Mnt6EKDVn4L2b8VOM1wl8pj0kIbln/g2whJ6x40Q3R4uO8o94Vy0HpeTTeNxdwyY
iERVkHPuRAZ2SLrz4BheiGRgcePgocMrofyGcJ3/ezMGcq3sUZnAJYmJYF+S5jkZ
Q94ee8jgypgEvuQHX52n3AiqJspNpqAGqXE5ZY4PgLt4KlwbT1yQ5hXUX34ou+3u
sLkFO8vNUQa4o1TTfwCzkPun/Aoy+Rb8BCzgbml/FlPgidXpNcofO/g5BWRjXwOX
mpUCNPiMhryx0aTrEgl1GChholqToakS6RTvqdxUKC3honKWBQ93zsWSDkEzS+3h
+qxzXDCN3u3Py9r/XvPK
=Pq+D
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation update from Jonathan Corbet:
"Here's my set of accumulated documentation changes for 3.19.
It includes a couple of additions to the coding style document, some
fixes for minor build problems within the documentation tree, the
relocation of the kselftest docs, and various tweaks and additions.
A couple of changes reach outside of Documentation/; they only make
trivial comment changes and I did my best to get the required acks.
Complete with a shiny signed tag this time around"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
kobject: grammar fix
Input: xpad - update docs to reflect current state
Documentation: Build mic/mpssd only for x86_64
cgroups: Documentation: fix wrong cgroupfs paths
Documentation/email-clients.txt: add info about Claws Mail
CodingStyle: add some more error handling guidelines
kselftest: Move the docs to the Documentation dir
Documentation: fix formatting to make 's' happy
Documentation: power: Fix typo in Documentation/power
Documentation: vm: Add 1GB large page support information
ipv4: add kernel parameter tcpmhash_entries
Documentation: Fix a typo in mailbox.txt
treewide: Fix typo in Documentation/DocBook/device-drivers
CodingStyle: Add a chapter on conditional compilation
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes lots of bug fixes based on clean-ups and
refactored codes. And inline_dir was introduced and two minor mount
options were added. Details from signed tag:
This series includes the following enhancement with refactored flows.
- fix inmemory page operations
- fix wrong inline_data & inline_dir logics
- enhance memory and IO control under memory pressure
- consider preemption on radix_tree operation
- fix memory leaks and deadlocks
But also, there are a couple of new features:
- support inline_dir to store dentries inside inode page
- add -o fastboot to reduce booting time
- implement -o dirsync
And a lot of clean-ups and minor bug fixes as well"
* tag 'for-f2fs-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (88 commits)
f2fs: avoid to ra unneeded blocks in recover flow
f2fs: introduce is_valid_blkaddr to cleanup codes in ra_meta_pages
f2fs: fix to enable readahead for SSA/CP blocks
f2fs: use atomic for counting inode with inline_{dir,inode} flag
f2fs: cleanup path to need cp at fsync
f2fs: check if inode state is dirty at fsync
f2fs: count the number of inmemory pages
f2fs: release inmemory pages when the file was closed
f2fs: set page private for inmemory pages for truncation
f2fs: count inline_xx in do_read_inode
f2fs: do retry operations with cond_resched
f2fs: call radix_tree_preload before radix_tree_insert
f2fs: use rw_semaphore for nat entry lock
f2fs: fix missing kmem_cache_free
f2fs: more fast lookup for gc_inode list
f2fs: cleanup redundant macro
f2fs: fix to return correct error number in f2fs_write_begin
f2fs: cleanup if-statement of phase in gc_data_segment
f2fs: fix to recover converted inline_data
f2fs: make clean the page before writing
...
"That letter [the last s] is sad because all the others
have those things [=] below them and it does not."
This patch fixes the tragedy so all the letters can
be happy again.
Signed-off-by: Maisa Roponen <maisa.roponen@gmail.com>
[The author being 4 years old needed some assistance]
Signed-off-by: Tero Roponen <tero.roponen@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Some distributions carry an "old" format of overlayfs while mainline has a
"new" format.
The distros will possibly want to keep the old overlayfs alongside the new
for compatibility reasons.
To make it possible to differentiate the two versions change the name of
the new one from "overlayfs" to "overlay".
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Andy Whitcroft <apw@canonical.com>
Pull the beginning of seq_file cleanup from Steven:
"I'm looking to clean up the seq_file code and to eventually merge the
trace_seq code with seq_file as well, since they basically do the same thing.
Part of this process is to remove the return code of seq_printf() and friends
as they are rather inconsistent. It is better to use the new function
seq_has_overflowed() if you want to stop processing when the buffer
is full. Note, if the buffer is full, the seq_file code will throw away
the contents, allocate a bigger buffer, and then call your code again
to fill in the data. The only thing that breaking out of the function
early does is to save a little time which is probably never noticed.
I started with patches from Joe Perches and modified them as well.
There's many more places that need to be updated before we can convert
seq_printf() and friends to return void. But this patch set introduces
the seq_has_overflowed() and does some initial updates."