linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-02 03:56:39 +07:00

Author	SHA1	Message	Date
Lars Ellenberg	84d34f2f07	drbd: improve network timeout detection Don't blame the peer for being unresponsive, if we did not even ask the question yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	142207f782	drbd: drbd_panic_after_delayed_completion_of_aborted_request() The only way to make DRBD intentionally call panic is to set a disk timeout, have that trigger, "abort" some request and complete to upper layers, then have the backend IO subsystem later complete these requests successfully regardless. As the attached IO pages have been recycled for other purposes meanwhile, this will cause unexpected random memory changes. To prevent corruption, we rather panic in that case. Make it obvious from stack traces that this was the case by introducing drbd_panic_after_delayed_completion_of_aborted_request(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	dc99562a48	drbd: add comment why we want to first call local-io-error, then send state Even though we really want to get the state information about our bad disk to the peer as soon as possible, it is useful to first call the local-io-error handler. People may chose to hard-reset the box from there. If that looks and behaves exactly like a "regular node crash", without bumping the data generation UUIDs on the peer in between, it makes it easier to deal with. If you intend to return from the local-io-error handler, then better return as quickly as possible to avoid triggering other timeouts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	9bd2eb2c98	drbd: also bump UUIDs if a diskless primary connects If for some reason the primary lost its disk and the replication link before it is able to communicate the disk loss, probably blocked IO, then later is able to re-establish the connection, the peer needs to bump its UUIDs just like it does when peer only loses the disk and is able to communicate this in time. Otherwise, a later re-attach of the disk on the primary may start a resync in the "wrong" direction. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	05a72772fc	drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever" When detaching, we make sure no application IO is in-flight by internally suspending IO, then trigger the state change, wait for the result, and finally internally resume IO again. Once we triggered the stat change to "Failed", we expect it to change from Failed to Diskless. (To avoid races, we actually wait for it to leave "Failed"). On an unresponsive local IO backend, this may not happen, ever. Don't have a "hung" detach block IO "forever", but resume IO before waiting for the state change to Diskless. We may well be able to continue IO to and from a healthy peer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Lars Ellenberg	05cbbb395f	drbd: Fix spurious disk-timeout (You should not use disk-timeout anyways, see the man page for why...) We add incoming requests to the tail of some ring list. On local completion, requests are removed from that list. The timer looks only at the head of that ring list, so is supposed to only see the oldest request. All protected by a spinlock. The request object is created with timestamps zeroed out. The timestamp was only filled in just before the actual submit. But to actually submit the request, we need to give up the spinlock. If you are unlucky, there is no older still pending request, the timer looks at a new request with timestamp still zero (before it even was submitted), and 0 + timeout is most likely older than "now". Better assign the timestamp right when we put the request object on said ring list. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:01 -07:00
Philipp Reisner	d38f861229	drbd: Replace 0 with the more meaningful GFP_NOWAIT GFP_NOWAIT has a value of 0. I.e. functionality not changed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Markus Elfring	d01efceeea	drbd: Deletion of an unnecessary check before the function call "lc_destroy" The lc_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	a55bbd375d	drbd: Backport the "status" command The status command originates the drbd9 code base. While for now we keep the status information in /proc/drbd available, this commit allows the user base to gracefully migrate their monitoring infrastructure to the new status reporting interface. In drbd9 no status information is exposed through /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	a29728463b	drbd: Backport the "events2" command The events2 command originates from drbd-9 development. It features more information but requires a incompatible change in output format. Therefore the previous events command continues to exist, the new improved events2 command becomes available now. This prepares the user-base for a later switch to the complete drbd9 code base. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	28bc3b8c71	drbd: Fix locking across all resources Instead of using a rwlock for synchronizing state changes across resources, take the request locks of all resources for global state changes. Use resources_mutex to serialize global state changes. This means that taking the request lock of a resource is now enough to prevent changes of that resource. (Previously, a read lock on the global state lock was needed as well.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	1ec317d3d1	drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	f6ba863639	drbd: Move enum write_ordering_e to drbd.h Also change the enum values to all-capital letters. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	5dd2ca1912	drbd: Get rid of some first_peer_device() calls Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Andreas Gruenbacher	2e9ffde6f0	drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync() There is no need to have these two as inline functions. In addition, drbd_should_send_out_of_sync() is only used in a single place, anyway. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:22:00 -07:00
Philipp Reisner	3b8a44f8ed	drbd: Remove pointless check In drbd-8.4 there is always a single connection per resource, and there is always exactly one peer_device for a device. peer_device can not be NULL here. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-25 09:21:59 -07:00
Rasmus Villemoes	8aeea03195	mtip32xx: use formatting capability of kthread_create_on_node kthread_create_on_node takes format+args, so there's no need to do the pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its '\0') only just fits in 16 bytes, so if index could ever go above 99 we'd have a stack buffer overflow. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-20 08:46:50 -07:00
Matias Bjørling	54514aa465	null_blk: do not del gendisk with lightnvm The gendisk structure has not been initialized when using lightnvm. Make sure to not delete it upon exit. Also make sure that we use the appropriate disk_name at unregistration. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-19 15:15:56 -07:00
Matias Bjørling	5b40db9909	null_blk: use device addressing mode The linear addressing mode was removed in `7386af2`. Make null_blk instead expose the ppa format geometry and support the generic addressing mode. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-19 15:15:54 -07:00
Matias Bjørling	6bb9535bc3	null_blk: use ppa_cache pool Instead of using a page pool, we can save memory by only allocating room for 64 entries for the ppa command. Introduce a ppa_cache to allocate only the required memory for the ppa list. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-19 15:15:53 -07:00
Matias Bjørling	b2b7e00148	null_blk: register as a LightNVM device Add support for registering as a LightNVM device. This allows us to evaluate the performance of the LightNVM subsystem. In /drivers/Makefile, LightNVM is moved above block device drivers to make sure that the LightNVM media managers have been initialized before drivers under /drivers/block are initialized. Signed-off-by: Matias Bjørling <m@bjorling.me> Fix by Jens Axboe to remove unneeded slab cache and the following memory leak. Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-16 15:22:28 -07:00
Linus Torvalds	ca4ba96e02	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There are several patches from Ilya fixing RBD allocation lifecycle issues, a series adding a nocephx_sign_messages option (and associated bug fixes/cleanups), several patches from Zheng improving the (directory) fsync behavior, a big improvement in IO for direct-io requests when striping is enabled from Caifeng, and several other small fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: clear msg->con in ceph_msg_release() only libceph: add nocephx_sign_messages option libceph: stop duplicating client fields in messenger libceph: drop authorizer check from cephx msg signing routines libceph: msg signing callouts don't need con argument libceph: evaluate osd_req_op_data() arguments only once ceph: make fsync() wait unsafe requests that created/modified inode ceph: add request to i_unsafe_dirops when getting unsafe reply libceph: introduce ceph_x_authorizer_cleanup() ceph: don't invalidate page cache when inode is no longer used rbd: remove duplicate calls to rbd_dev_mapping_clear() rbd: set device_type::release instead of device::release rbd: don't free rbd_dev outside of the release callback rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails libceph: use local variable cursor instead of &msg->cursor libceph: remove con argument in handle_reply() ceph: combine as many iovec as possile into one OSD request ceph: fix message length computation ceph: fix a comment typo rbd: drop null test before destroy functions	2015-11-13 09:24:40 -08:00
Jan Kara	2dbe549576	brd: Refuse improperly aligned discard requests Currently when improperly aligned discard request is submitted, we just silently discard more / less data which results in filesystem corruption in some cases. Refuse such misaligned requests. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-11 09:36:56 -07:00
Linus Torvalds	3419b45039	Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie	2015-11-10 17:23:49 -08:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Linus Torvalds	75021d2859	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions	2015-11-07 13:05:44 -08:00
Jens Axboe	dece16353e	block: change ->make_request_fn() and users to return a queue cookie No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>	2015-11-07 10:40:46 -07:00
Oleg Nesterov	be0e6f290f	signal: turn dequeue_signal_lock() into kernel_dequeue_signal() 1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Geliang Tang	1c53e0d273	zram: make is_partial_io/valid_io_request/page_zero_filled return boolean Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean, since each function only uses either one or zero as its return value. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Sergey SENOZHATSKY	1237275580	zram: keep the exact overcommited value in mem_used_max `mem_used_max' is designed to store the max amount of memory zram consumed to store the data. However, it does not represent the actual 'overcommited' (max) value. The existing code goes to -ENOMEM overcommited case before it updates `->stats.max_used_pages', which hides the reason we went to -ENOMEM in the first place -- we actually used more memory than `->limit_pages': alloced_pages = zs_get_total_pages(meta->mem_pool); if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } update_used_max(zram, alloced_pages); Which is misleading. User will see -ENOMEM, check `->limit_pages', check `->stats.max_used_pages', which will keep the value BEFORE zram passed `->limit_pages', and see: `->stats.max_used_pages' < `->limit_pages' Move update_used_max() before we do `->limit_pages' check, so that user will see: `->stats.max_used_pages' > `->limit_pages' should the overcommit and -ENOMEM happen. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Luis Henriques	1d5b43bfb6	zram: introduce comp algorithm fallback functionality When the user supplies an unsupported compression algorithm, keep the previously selected one (knowingly supported) or the default one (if the compression algorithm hasn't been changed yet). Note that previously this operation (i.e. setting an invalid algorithm) would result in no algorithm being selected, which means that this represents a small change in the default behaviour. Minchan said: For initializing zram, we need to set up 3 optional parameters in advance. 1. the number of compression streams 2. memory limitation 3. compression algorithm Although user pass completely wrong value to set up for 1 and 2 parameters, it's okay because they have default value so zram will be initialized with the default value (of course, when user passes a wrong value via echo, sysfs returns -EINVAL so the user can notice it). But 3 is not consistent with other optional parameters. IOW, if the user passes a wrong value to set up 3 parameter, zram's initialization would fail unlike other optional parameters. So this patch makes them consistent. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Mel Gorman	71baba4b92	mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM __GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Mel Gorman	d0164adc89	mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Linus Torvalds	9cf5c095b6	asm-generic cleanups The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5 nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA 4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3 uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01 KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu vTUkB/I66Hk= =dQKG -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: temporarily add back asm-generic/io-64-nonatomic.h asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations gpio-mxc: stop including <asm-generic/bug> n_tracesink: stop including <asm-generic/bug> n_tracerouter: stop including <asm-generic/bug> mlx5: stop including <asm-generic/kmap_types.h> hifn_795x: stop including <asm-generic/kmap_types.h> drbd: stop including <asm-generic/kmap_types.h> move count_zeroes.h out of asm-generic move io-64-nonatomic.h out of asm-generic	2015-11-06 14:22:15 -08:00
Linus Torvalds	a9aa31cdc2	Merge branch 'for-4.4/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the block driver changes for 4.4. This pull request contains: - NVMe: - Refactor and moving of code to prepare for proper target support. From Christoph and Jay. - 32-bit nvme warning fix from Arnd. - Error initialization fix from me. - Proper namespace removal and reference counting support from Keith. - Device resume fix on IO failure, also from Keith. - Dependency fix from Keith, now that nvme isn't under the umbrella of the block anymore. - Target location and maintainers update from Jay. - From Ming Lei, the long awaited DIO/AIO support for loop. - Enable BD-RE writeable opens, from Georgios" * 'for-4.4/drivers' of git://git.kernel.dk/linux-block: (24 commits) Update target repo for nvme patch contributions NVMe: initialize error to '0' nvme: use an integer value to Linux errno values nvme: fix 32-bit build warning NVMe: Add explicit block config dependency nvme: include <linux/types.ĥ> in <linux/nvme.h> nvme: move to a new drivers/nvme/host directory nvme.h: add missing nvme_id_ctrl endianess annotations nvme: move hardware structures out of the uapi version of nvme.h nvme: add a local nvme.h header nvme: properly handle partially initialized queues in nvme_create_io_queues nvme: merge nvme_dev_start, nvme_dev_resume and nvme_async_probe nvme: factor reset code into a common helper nvme: merge nvme_dev_reset into nvme_reset_failed_dev nvme: delete dev from dev_list in nvme_reset NVMe: Simplify device resume on io queue failure NVMe: Namespace removal simplifications NVMe: Reference count open namespaces cdrom: Random writing support for BD-RE media block: loop: support DIO & AIO ...	2015-11-04 20:37:27 -08:00
Linus Torvalds	41ecf1404b	xen: features for 4.4-rc0 - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWOeSkAAoJEFxbo/MsZsTRph0H/0nE8Tx0GyGtOyCYfBdInTvI WgjvL8VR1XrweZMVis3668MzhLSYg6b5lvJsoi+L3jlzYRyze43iHXsKfvp+8p0o TVUhFnlHEHF8ASEtPydAi6HgS7Dn9OQ9LaZ45R1Gk0rHnwJjIQonhTn2jB0yS9Am Hf4aZXP2NVZphjYcloqNsLH0G6mGLtgq8cS0uKcVO2YIrR4Dr3sfj9qfq9mflf8n sA/5ifoHRfOUD1vJzYs4YmIBUv270jSsprWK/Mi2oXIxUTBpKRAV1RVCAPW6GFci HIZjIJkjEPWLsvxWEs0dUFJQGp3jel5h8vFPkDWBYs3+9rILU2DnLWpKGNDHx3k= =vUfa -----END PGP SIGNATURE----- Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. * tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits) xen: fix the check of e_pfn in xen_find_pfn_range x86/xen: add reschedule point when mapping foreign GFNs xen/arm: don't try to re-register vcpu_info on cpu_hotplug. xen, cpu_hotplug: call device_offline instead of cpu_down xen/arm: Enable cpu_hotplug.c xenbus: Support multiple grants ring with 64KB xen/grant-table: Add an helper to iterate over a specific number of grants xen/xenbus: Rename RING_PAGE to RING_GRANT xen/arm: correct comment in enlighten.c xen/gntdev: use types from linux/types.h in userspace headers xen/gntalloc: use types from linux/types.h in userspace headers xen/balloon: Use the correct sizeof when declaring frame_list xen/swiotlb: Add support for 64KB page granularity xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb arm/xen: Add support for 64KB page granularity xen/privcmd: Add support for Linux 64KB page granularity net/xen-netback: Make it running on 64KB page granularity net/xen-netfront: Make it running on 64KB page granularity block/xen-blkback: Make it running on 64KB page granularity block/xen-blkfront: Make it running on 64KB page granularity ...	2015-11-04 17:32:42 -08:00
Ilya Dryomov	4afb04c0c8	rbd: remove duplicate calls to rbd_dev_mapping_clear() Commit `d1cf578845` ("rbd: set mapping info earlier") defined rbd_dev_mapping_clear(), but, just a few days after, commit `f35a4dee14` ("rbd: set the mapping size and features later") moved rbd_dev_mapping_set() calls and added another rbd_dev_mapping_clear() call instead of moving the old one. Around the same time, another duplicate was introduced in rbd_dev_device_release() - kill both. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-11-02 23:36:48 +01:00
Ilya Dryomov	6cac4695f2	rbd: set device_type::release instead of device::release No point in providing an empty device_type::release callback and then setting device::release for each rbd_dev dynamically. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-11-02 23:36:48 +01:00
Ilya Dryomov	dd5ac32d42	rbd: don't free rbd_dev outside of the release callback struct rbd_device has struct device embedded in it, which means it's part of kobject universe and has an unpredictable life cycle. Freeing its memory outside of the release callback is flawed, yet commits `200a6a8be5` ("rbd: don't destroy rbd_dev in device release function") and `8ad42cd0c0` ("rbd: don't have device release destroy rbd_dev") moved rbd_dev_destroy() out to rbd_dev_image_release(). This commit reverts most of that, the key points are: - rbd_dev->dev is initialized in rbd_dev_create(), making it possible to use rbd_dev_destroy() - which is just a put_device() - both before we register with device core and after. - rbd_dev_release() (the release callback) is the only place we kfree(rbd_dev). It's also where we do module_put(), keeping the module unload race window as small as possible. - We pin the module in rbd_dev_create(), but only for mapping rbd_dev-s. Moving image related stuff out of struct rbd_device into another struct which isn't tied with sysfs and device core is long overdue, but until that happens, this will keep rbd module refcount (which users can observe with lsmod) sane. Fixes: http://tracker.ceph.com/issues/12697 Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-11-02 23:36:48 +01:00
Ilya Dryomov	b51c83c241	rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails Returning pool id (i.e. >= 0) from a sysfs ->store() callback makes userspace think it needs to retry the write. Fix it - it's a leftover from the times when the equivalent of rbd_dev_create() was the first action in rbd_add(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-11-02 23:36:48 +01:00
Julia Lawall	13bf283408	rbd: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) { \(kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy\)(x); x = NULL; -} // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-11-02 23:36:47 +01:00
Ronny Hegewald	bae818ee15	rbd: require stable pages if message data CRCs are enabled rbd requires stable pages, as it performs a crc of the page data before they are send to the OSDs. But since kernel 3.9 (patch `1d1d1a7672` "mm: only enforce stable page writes if the backing device requires it") it is not assumed anymore that block devices require stable pages. This patch sets the necessary flag to get stable pages back for rbd. In a ceph installation that provides multiple ext4 formatted rbd devices "bad crc" messages appeared regularly (ca 1 message every 1-2 minutes on every OSD that provided the data for the rbd) in the OSD-logs before this patch. After this patch this messages are pretty much gone (only ca 1-2 / month / OSD). Cc: stable@vger.kernel.org # 3.9+, needs backporting Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de> [idryomov@gmail.com: require stable pages only in crc case, changelog] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-10-30 19:25:02 +01:00
Linus Torvalds	ea1ee5ff1b	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A final set of fixes for 4.3. It is (again) bigger than I would have liked, but it's all been through the testing mill and has been carefully reviewed by multiple parties. Each fix is either a regression fix for this cycle, or is marked stable. You can scold me at KS. The pull request contains: - Three simple fixes for NVMe, fixing regressions since 4.3. From Arnd, Christoph, and Keith. - A single xen-blkfront fix from Cathy, fixing a NULL dereference if an error is returned through the staste change callback. - Fixup for some bad/sloppy code in nbd that got introduced earlier in this cycle. From Markus Pargmann. - A blk-mq tagset use-after-free fix from Junichi. - A backing device lifetime fix from Tejun, fixing a crash. - And finally, a set of regression/stable fixes for cgroup writeback from Tejun" * 'for-linus' of git://git.kernel.dk/linux-block: writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy() NVMe: Fix memory leak on retried commands block: don't release bdi while request_queue has live references nvme: use an integer value to Linux errno values blk-mq: fix use-after-free in blk_mq_free_tag_set() nvme: fix 32-bit build warning writeback: fix incorrect calculation of available memory for memcg domains writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions writeback: bdi_writeback iteration must not skip dying ones writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback() writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration nbd: Add locking for tasks xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)	2015-10-24 07:20:57 +09:00
Ilya Dryomov	6d69bb536b	rbd: prevent kernel stack blow up on rbd map Mapping an image with a long parent chain (e.g. image foo, whose parent is bar, whose parent is baz, etc) currently leads to a kernel stack overflow, due to the following recursion in the reply path: rbd_osd_req_callback() rbd_obj_request_complete() rbd_img_obj_callback() rbd_img_parent_read_callback() rbd_obj_request_complete() ... Limit the parent chain to 16 images, which is ~5K worth of stack. When the above recursion is eliminated, this limit can be lifted. Fixes: http://tracker.ceph.com/issues/12538 Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2015-10-23 18:37:24 +02:00
Ilya Dryomov	1f2c6651f6	rbd: don't leak parent_spec in rbd_dev_probe_parent() Currently we leak parent_spec and trigger a "parent reference underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. The problem is we take the !parent out_err branch and that only drops refcounts; parent_spec that would've been freed had we called rbd_dev_unparent() remains and triggers rbd_warn() in rbd_dev_parent_put() - at that point we have parent_spec != NULL and parent_ref == 0, so counter ends up being -1 after the decrement. Redo rbd_dev_probe_parent() to fix this. Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-10-23 18:36:03 +02:00
Julien Grall	9cce2914e2	xen/xenbus: Rename RING_PAGE to RING_GRANT Linux may use a different page size than the size of grant. So make clear that the order is actually in number of grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:46 +01:00
Julien Grall	67de5dfbc1	block/xen-blkback: Make it running on 64KB page granularity The PV block protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity behaving as a block backend on a non-modified Xen. It's only necessary to adapt the ring size and the number of request per indirect frames. The rest of the code is relying on the grant table code. Note that the grant table code is allocating a Linux page per grant which will result to waste 6OKB for every grant when Linux is using 64KB page granularity. This could be improved by sharing the page between multiple grants. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:40 +01:00
Julien Grall	c004a6fe0c	block/xen-blkfront: Make it running on 64KB page granularity The PV block protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity using block device on a non-modified Xen. The block API is using segment which should at least be the size of a Linux page. Therefore, the driver will have to break the page in chunk of 4K before giving the page to the backend. When breaking a 64KB segment in 4KB chunks, it is possible that some chunks are empty. As the PV protocol always require to have data in the chunk, we have to count the number of Xen page which will be in use and avoid sending empty chunks. Note that, a pre-defined number of grants are reserved before preparing the request. This pre-defined number is based on the number and the maximum size of the segments. If each segment contains a very small amount of data, the driver may reserve too many grants (16 grants is reserved per segment with 64KB page granularity). Furthermore, in the case of persistent grants we allocate one Linux page per grant although only the first 4KB of the page will be effectively in use. This could be improved by sharing the page with multiple grants. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:39 +01:00
Julien Grall	4f503fbdf3	block/xen-blkfront: split get_grant in 2 Prepare the code to support 64KB page granularity. The first implementation will use a full Linux page per indirect and persistent grant. When non-persistent grant is used, each page of a bio request may be split in multiple grant. Furthermore, the field page of the grant structure is only used to copy data from persistent grant or indirect grant. Avoid to set it for other use case as it will have no meaning given the page will be split in multiple grant. Provide 2 functions, to setup indirect grant, the other for bio page. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:36 +01:00
Julien Grall	a7a6df2223	block/xen-blkfront: Store a page rather a pfn in the grant structure All the usage of the field pfn are done using the same idiom: pfn_to_page(grant->pfn) This will return always the same page. Store directly the page in the grant to clean up the code. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-10-23 14:20:35 +01:00

1 2 3 4 5 ...

4983 Commits