linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-02 14:06:45 +07:00

Author	SHA1	Message	Date
Matias Bjørling	5fdb7e1b97	null_blk: fix wrong capacity when bs is not 512 bytes set_capacity() sets device's capacity using 512 bytes sectors. null_blk calculates the number of sectors by size / bs, which set_capacity is called with. This led to null_blk exposing the wrong number of sectors when bs is not 512 bytes. Signed-off-by: Matias Bjørling <m@bjorling.me> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-09-02 16:49:42 -06:00
Matias Bjørling	de65d2d26f	null_blk: fix memory leak on cleanup Driver was not freeing the memory allocated for internal nullb queues. This patch frees the memory during driver unload. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-09-02 16:49:41 -06:00
Linus Torvalds	52b084d31c	Merge branch 'for-4.3/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "On top of the 4.3 core block IO changes, here are the driver related changes for 4.3. Basically just NVMe and nbd this time around: - NVMe: - PRACT PI improvement from Alok Pandey. - Cleanups and improvements on submission queue doorbell and writing, using CMB if available. From Jon Derrick. - From Keith, support for setting queue maximum segments, and reset support. - Also from Jon, fixup of u64 division issue on 32-bit archs and wiring up of the reset support through and ioctl. - Two small cleanups from Matias and Sunad - Various code cleanups and fixes from Markus Pargmann" * 'for-4.3/drivers' of git://git.kernel.dk/linux-block: NVMe: Using PRACT bit to generate and verify PI by controller NVMe:Remove unreachable code in nvme_abort_req NVMe: Add nvme subsystem reset IOCTL NVMe: Add nvme subsystem reset support NVMe: removed unused nn var from nvme_dev_add NVMe: Set queue max segments nbd: flags is a u32 variable nbd: Rename functions for clearness of recv/send path nbd: Change 'disconnect' to be boolean nbd: Add debugfs entries nbd: Remove variable 'pid' nbd: Move clear queue debug message nbd: Remove 'harderror' and propagate error properly nbd: restructure sock_shutdown nbd: sock_shutdown, remove conditional lock nbd: Fix timeout detection nvme: Fixes u64 division which breaks i386 builds NVMe: Use CMB for the IO SQes if available NVMe: Unify SQ entry writing and doorbell ringing	2015-09-02 13:14:58 -07:00
Linus Torvalds	1081230b74	Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...	2015-09-02 13:10:25 -07:00
Alok Pandey	e19b127f5b	NVMe: Using PRACT bit to generate and verify PI by controller This patch enables the PRCHK and reftag support when PRACT bit is set, and block layer integrity is disabled. Signed-off-by: Alok Pandey <pandey.alok@samsung.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-26 08:58:09 -06:00
Jeff Moyer	74c9c9134b	mtip32x: fix regression introduced by blk-mq per-hctx flush Hi, After commit `f70ced0917` (blk-mq: support per-distpatch_queue flush machinery), the mtip32xx driver may oops upon module load due to walking off the end of an array in mtip_init_cmd. On initialization of the flush_rq, init_request is called with request_index >= the maximum queue depth the driver supports. For mtip32xx, this value is used to index into an array. What this means is that the driver will walk off the end of the array, and either oops or cause random memory corruption. The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx driver in a loop. I can typically reproduce the problem in about 30 seconds. Now, in the case of mtip32xx, it actually doesn't support flush/fua, so I think we can simply return without doing anything. In addition, no other mq-enabled driver does anything with the request_index passed into init_request(), so no other driver is affected. However, I'm not really sure what is expected of drivers. Ming, what did you envision drivers would do when initializing the flush requests? Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-25 14:35:51 -06:00
Sunad Bhandary	e3f879bf1e	NVMe:Remove unreachable code in nvme_abort_req Removing unreachable code from nvme_abort_req as nvme_submit_cmd has no failure status to return. Signed-off-by: Sunad Bhandary <sunad.s@samsung.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-19 14:28:24 -07:00
Keith Busch	03100aada9	block: Replace SG_GAPS with new queue limits mask The SG_GAPS queue flag caused checks for bio vector alignment against PAGE_SIZE, but the device may have different constraints. This patch adds a queue limits so a driver with such constraints can set to allow requests that would have been unnecessarily split. The new gaps check takes the request_queue as a parameter to simplify the logic around invoking this function. This new limit makes the queue flag redundant, so removing it and all usage. Device-mappers will inherit the correct settings through blk_stack_limits(). Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-19 14:26:02 -07:00
Jeff Moyer	30e2bc08b2	Revert "block: remove artifical max_hw_sectors cap" This reverts commit `34b48db66e`. That commit caused performance regressions for streaming I/O workloads on a number of different storage devices, from SATA disks to external RAID arrays. It also managed to trip up some buggy firmware in at least one drive, causing data corruption. The next patch will bump the default max_sectors_kb value to 1280, which will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress, a value of 1280 does not show a big performance difference from 512. This will hopefully still help the software RAID setup that Christoph saw the original performance gains with while still not regressing other storage configurations. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 13:21:13 -07:00
Jon Derrick	81f03fedcc	NVMe: Add nvme subsystem reset IOCTL Controllers can perform optional subsystem resets as introduced in NVMe 1.1. This patch adds an IOCTL to trigger the subsystem reset by writing "NVMe" to the NSSR register. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 11:56:13 -06:00
Keith Busch	dfbac8c7ac	NVMe: Add nvme subsystem reset support Controllers part of an NVMe subsystem may be reset by any other controller in the subsystem. If the device is capable of subsystem resets, this patch adds detection for such events and performs appropriate controller initialization upon subsystem reset detection. The register bit is a RW1C type, so the driver needs to write a 1 to the status bit to clear the subsystem reset occured bit during initialization. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 11:56:11 -06:00
Matias Bjørling	b2b1ec9b55	NVMe: removed unused nn var from nvme_dev_add The logic in nvme_dev_add to enumerate namespaces was moved to nvme_dev_scan. When moved, the nn variable is no longer used. This patch removes it. Fixes: a5768aai ("NVMe: Automatic namespace rescan") Signed-off-by: Matias Bjørling <m@bjorling.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 10:13:41 -06:00
Keith Busch	e824410ffc	NVMe: Set queue max segments This sets the queue's max segment size to match the device's capabilities. The default of 128 is usable until a device's transfer capability exceeds 512k, assuming a device page size of 4k. Many nvme devices exceed that transfer limit, so this lets the block layer know what kind of commands it to allow to form rather than unnecessarily split them. One additional segment is added to account for a transfer that may start in the middle of a page. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 15:52:23 -06:00
Markus Pargmann	22d109c1bb	nbd: flags is a u32 variable The flags variable is used as u32 variable. This patch changes the type to be u32. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:23:01 -06:00
Markus Pargmann	cad73b2703	nbd: Rename functions for clearness of recv/send path This patch renames functions so that it is clear what the function does. Otherwise it is not directly understandable what for example 'do_it' means. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:59 -06:00
Markus Pargmann	696697cb50	nbd: Change 'disconnect' to be boolean Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:58 -06:00
Markus Pargmann	30d53d9c11	nbd: Add debugfs entries Add some debugfs files that help to understand the internal state of NBD. This exports the different sizes, flags, tasks and so on. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:56 -06:00
Markus Pargmann	6521d39a64	nbd: Remove variable 'pid' This patch uses nbd->task_recv to determine the value of the previously used variable 'pid' for sysfs. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:55 -06:00
Markus Pargmann	e78273c80b	nbd: Move clear queue debug message This message was a warning without a reason. This patch moves it into nbd_clear_que and transforms it to a debug message. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:53 -06:00
Markus Pargmann	193918307f	nbd: Remove 'harderror' and propagate error properly Instead of a variable 'harderror' we can simply try to correctly propagate errors to the userspace. This patch removes the harderror variable and passes errors through error pointers and nbd_do_it back to the userspace. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:52 -06:00
Markus Pargmann	260bbce403	nbd: restructure sock_shutdown This patch restructures sock_shutdown to avoid having the main code path in an if block. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:50 -06:00
Markus Pargmann	36e47bee7c	nbd: sock_shutdown, remove conditional lock Move the conditional lock from sock_shutdown into the surrounding code. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:49 -06:00
Markus Pargmann	7e2893a16d	nbd: Fix timeout detection At the moment the nbd timeout just detects hanging tcp operations. This is not enough to detect a hanging or bad connection as expected of a timeout. This patch redesigns the timeout detection to include some more cases. The timeout is now in relation to replies from the server. If the server does not send replies within the timeout the connection will be shut down. The patch adds a continous timer 'timeout_timer' that is setup in one of two cases: - The request list is empty and we are sending the first request out to the server. We want to have a reply within the given timeout, otherwise we consider the connection to be dead. - A server response was received. This means the server is still communicating with us. The timer is reset to the timeout value. The timer is not stopped if the list becomes empty. It will just trigger a timeout which will directly leave the handling routine again as the request list is empty. The whole patch does not use any additional explicit locking. The list_empty() calls are safe to be used concurrently. The timer is locked internally as we just use mod_timer and del_timer_sync(). The patch is based on the idea of Michal Belczyk with a previous different implementation. Cc: Michal Belczyk <belczyk@bsd.krakow.pl> Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de> Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Tested-by: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-17 08:22:47 -06:00
Sergey Senozhatsky	4ce321f574	zram: fix pool name truncation zram_meta_alloc() constructs a pool name for zs_create_pool() call as snprintf(pool_name, sizeof(pool_name), "zram%d", device_id); However, it defines pool name buffer to be only 8 bytes long (minus trailing zero), which means that we can have only 1000 pool names: zram0 -- zram999. With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000 can fail if device zram100 already exists, because snprintf() will truncate new pool name to zram100 and pass it debugfs_create_dir(), causing: debugfs dir <zram100> creation failed zram: Error creating memory pool ... and so on. Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of divice_id. We construct zram%d name earlier and keep it as a ->disk_name, no need to snprintf() it again. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-08-14 15:56:32 -07:00
Linus Torvalds	ebcbf1664c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull xen block driver fixes from Jens Axboe: "A few small bug fixes for xen-blk{front,back} that have been sitting over my vacation" * 'for-linus' of git://git.kernel.dk/linux-block: xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() xen-blkfront: don't add indirect pages to list when !feature_persistent xen-blkfront: introduce blkfront_gather_backend_features()	2015-08-13 13:44:32 -07:00
Kent Overstreet	8ae126660f	block: kill merge_bvec_fn() completely As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:31:57 -06:00
Kent Overstreet	54efd50bfd	block: make generic_make_request handle arbitrarily sized bios The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-13 12:31:33 -06:00
Ilya Dryomov	2761713d35	rbd: fix copyup completion race For write/discard obj_requests that involved a copyup method call, the opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets ->xferred and delegates to rbd_img_obj_callback(), the "normal" image object callback, for reporting to block layer and putting refs. rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op, which means obj_request is marked done in rbd_osd_trivial_callback(), before ->callback is invoked and rbd_img_obj_copyup_callback() has a chance to run. Marking obj_request done essentially means giving rbd_img_obj_callback() a license to end it at any moment, so if another obj_request from the same img_request is being completed concurrently, rbd_img_obj_end_request() may very well be called on such prematurally marked done request: <obj_request-1/2 reply> handle_reply() rbd_osd_req_callback() rbd_osd_trivial_callback() rbd_obj_request_complete() rbd_img_obj_copyup_callback() rbd_img_obj_callback() <obj_request-2/2 reply> handle_reply() rbd_osd_req_callback() rbd_osd_trivial_callback() for_each_obj_request(obj_request->img_request) { rbd_img_obj_end_request(obj_request-1/2) rbd_img_obj_end_request(obj_request-2/2) <-- } Calling rbd_img_obj_end_request() on such a request leads to trouble, in particular because its ->xfferred is 0. We report 0 to the block layer with blk_update_request(), get back 1 for "this request has more data in flight" and then trip on rbd_assert(more ^ (which == img_request->obj_request_count)); with rhs (which == ...) being 1 because rbd_img_obj_end_request() has been called for both requests and lhs (more) being 1 because we haven't got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet. To fix this, leverage that rbd wants to call class methods in only two cases: one is a generic method call wrapper (obj_request is standalone) and the other is a copyup (obj_request is part of an img_request). So make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke rbd_img_obj_copyup_callback() from it if obj_request is part of an img_request, similar to how CEPH_OSD_OP_READ handler invokes rbd_img_obj_request_read_callback(). Since rbd_img_obj_copyup_callback() is now being called from the OSD request callback (only), it is renamed to rbd_osd_copyup_callback(). Cc: Alex Elder <elder@linaro.org> Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-07-31 11:38:57 +03:00
Christoph Hellwig	4246a0b63b	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-29 08:55:15 -06:00
Jens Axboe	e162b219ae	Merge branch 'stable/for-jens-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Konrad writes: "There are three bugs that have been found in the xen-blkfront (and backend). Two of them have the stable tree CC-ed. They have been found where an guest is migrating to a host that is missing 'feature-persistent' support (from one that has it enabled). We end up hitting an BUG() in the driver code."	2015-07-27 11:58:41 -06:00
Bob Liu	53bc7dc004	xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge work haven't finished. There is a work_pending() before this BUG_ON, but it doesn't account if the work is still currently running. CC: stable@vger.kernel.org Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2015-07-24 09:09:49 -04:00
Bob Liu	7b0767502b	xen-blkfront: don't add indirect pages to list when !feature_persistent We should consider info->feature_persistent when adding indirect page to list info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered. When we are using persistent grants the indirect_pages list should always be empty because blkfront has pre-allocated enough persistent pages to fill all requests on the ring. CC: stable@vger.kernel.org Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2015-07-24 09:09:16 -04:00
Bob Liu	d50babbe30	xen-blkfront: introduce blkfront_gather_backend_features() There is a bug when migrate from !feature-persistent host to feature-persistent host, because domU still thinks new host/backend doesn't support persistent. Dmesg like: backed has not unmapped grant: 839 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 839 The fix is to recheck feature-persistent of new backend in blkif_recover(). See: https://lkml.org/lkml/2015/5/25/469 As Roger suggested, we can split the part of blkfront_connect that checks for optional features, like persistent grants, indirect descriptors and flush/barrier features to a separate function and call it from both blkfront_connect and blkif_recover Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2015-07-24 09:07:10 -04:00
Mike Krinkin	21974061cf	null_blk: fix use-after-free problem end_cmd finishes request associated with nullb_cmd struct, so we should save pointer to request_queue in a local variable before calling end_cmd. The problem was causes general protection fault with slab poisoning enabled. Fixes: `8b70f45e2e` ("null_blk: restart request processing on completion handler") Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-22 13:30:20 -06:00
Jon Derrick	c45f5c9943	nvme: Fixes u64 division which breaks i386 builds Uses div_u64 for u64 division and round_down, a bitwise operation, instead of rounddown, which uses a modulus. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-21 15:36:24 -06:00
Jon Derrick	8ffaadf742	NVMe: Use CMB for the IO SQes if available Some controllers have a controller-side memory buffer available for use for submissions, completions, lists, or data. If a CMB is available, the entire CMB will be ioremapped and it will attempt to map the IO SQes onto the CMB. The queues will be shrunk as needed. The CMB will not be used if the queue depth is shrunk below some threshold where it may have reduced performance over a larger queue in system memory. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-21 09:40:11 -06:00
Jon Derrick	498c43949c	NVMe: Unify SQ entry writing and doorbell ringing This patch changes sq_cmd writers to instead create their command on the stack. __nvme_submit_cmd copies the sq entry to the queue and writes the doorbell. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-21 09:40:09 -06:00
Jens Axboe	2bb4cd5cc4	block: have drivers use blk_queue_max_discard_sectors() Some drivers use it now, others just set the limits field manually. But in preparation for splitting this into a hard and soft limit, ensure that they all call the proper function for setting the hw limit for discards. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-17 08:41:53 -06:00
Keith Busch	7bee607472	NVMe: Reread partitions on metadata formats This patch has the driver automatically reread partitions if a namespace has a separate metadata format. Previously revalidating a disk was sufficient to get the correct capacity set on such formatted drives, but partitions that may exist would not have been surfaced. Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Tested-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-15 15:36:47 -06:00
Linus Torvalds	1dc51b8288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...	2015-07-04 19:36:06 -07:00
Linus Torvalds	1e512b08da	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Mainly sending this off now for the writeback fixes, since they fix a real regression introduced with the cgroup writeback changes. The NVMe fix could wait for next pull for this series, but it's simple enough that we might as well include it. This contains: - two cgroup writeback fixes from Tejun, fixing a user reported issue with luks crypt devices hanging when being closed. - NVMe error cleanup fix from Jon Derrick, fixing a case where we'd attempt to free an unregistered IRQ" * 'for-linus' of git://git.kernel.dk/linux-block: NVMe: Fix irq freeing when queue_request_irq fails writeback: don't drain bdi_writeback_congested on bdi destruction writeback: don't embed root bdi_writeback_congested in bdi_writeback	2015-07-03 12:12:16 -07:00
Linus Torvalds	0c76c6ba24	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "We have a pile of bug fixes from Ilya, including a few patches that sync up the CRUSH code with the latest from userspace. There is also a long series from Zheng that fixes various issues with snapshots, inline data, and directory fsync, some simplification and improvement in the cap release code, and a rework of the caching of directory contents. To top it off there are a few small fixes and cleanups from Benoit and Hong" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) rbd: use GFP_NOIO in rbd_obj_request_create() crush: fix a bug in tree bucket decode libceph: Fix ceph_tcp_sendpage()'s more boolean usage libceph: Remove spurious kunmap() of the zero page rbd: queue_depth map option rbd: store rbd_options in rbd_device rbd: terminate rbd_opts_tokens with Opt_err ceph: fix ceph_writepages_start() rbd: bump queue_max_segments ceph: rework dcache readdir crush: sync up with userspace crush: fix crash from invalid 'take' argument ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL ceph: pre-allocate data structure that tracks caps flushing ceph: re-send flushing caps (which are revoked) in reconnect stage ceph: send TID of the oldest pending caps flush to MDS ceph: track pending caps flushing globally ceph: track pending caps flushing accurately libceph: fix wrong name "Ceph filesystem for Linux" ceph: fix directory fsync ...	2015-07-02 11:35:00 -07:00
Jon Derrick	758dd7fdff	NVMe: Fix irq freeing when queue_request_irq fails Fixes an issue when queue_reuest_irq fails in nvme_setup_io_queues. This patch initializes all vectors to -1 and resets the vector to -1 in the case of a failure in queue_request_irq. This avoids the free_irq in nvme_suspend_queue if the queue did not get an irq. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-07-02 09:01:25 -06:00
Linus Torvalds	7adf12b87f	xen: features and cleanups for 4.2-rc0 - Add "make xenconfig" to assist in generating configs for Xen guests. - Preparatory cleanups necessary for supporting 64 KiB pages in ARM guests. - Automatically use hvc0 as the default console in ARM guests. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJVkpoqAAoJEFxbo/MsZsTRu3IH/2AMPx2i65hoSqfHtGf3sz/z XNfcidVmOElFVXGaW83m0tBWMemT5LpOGRfiq5sIo8xt/8xD2vozEkl/3kkf3RrX EmZDw3E8vmstBdBTjWdovVhNenRc0m0pB5daS7wUdo9cETq1ag1L3BHTB3fEBApO 74V6qAfnhnq+snqWhRD3XAk3LKI0nWuWaV+5HsmxDtnunGhuRLGVs7mwxZGg56sM mILA0eApGPdwyVVpuDe0SwV52V8E/iuVOWTcomGEN2+cRWffG5+QpHxQA8bOtF6O KfqldiNXOY/idM+5+oSm9hespmdWbyzsFqmTYz0LvQvxE8eEZtHHB3gIcHkE8QU= =danz -----END PGP SIGNATURE----- Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Xen features and cleanups for 4.2-rc0: - add "make xenconfig" to assist in generating configs for Xen guests - preparatory cleanups necessary for supporting 64 KiB pages in ARM guests - automatically use hvc0 as the default console in ARM guests" * tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: block/xen-blkback: s/nr_pages/nr_segs/ block/xen-blkfront: Remove invalid comment block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS arm/xen: Drop duplicate define mfn_to_virt xen/grant-table: Remove unused macro SPP xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring xen: Include xen/page.h rather than asm/xen/page.h kconfig: add xenconfig defconfig helper kconfig: clarify kvmconfig is for kvm xen/pcifront: Remove usage of struct timeval xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON() hvc_xen: avoid uninitialized variable warning xenbus: avoid uninitialized variable warning xen/arm: allow console=hvc0 to be omitted for guests arm,arm64/xen: move Xen initialization earlier arm/xen: Correctly check if the event channel interrupt is present	2015-07-01 11:53:46 -07:00
Linus Torvalds	02201e3f1b	Minor merge needed, due to function move. Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek tLdh/a9GiCitqS0bT7GE =tWPQ -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits) modules: only use mod->param_lock if CONFIG_MODULES param: fix module param locks when !CONFIG_SYSFS. rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() module: add per-module param_lock module: make perm const params: suppress unused variable error, warn once just in case code changes. modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'. kernel/module.c: avoid ifdefs for sig_enforce declaration kernel/workqueue.c: remove ifdefs over wq_power_efficient kernel/params.c: export param_ops_bool_enable_only kernel/params.c: generalize bool_enable_only kernel/module.c: use generic module param operaters for sig_enforce kernel/params: constify struct kernel_param_ops uses sysfs: tightened sysfs permission checks module: Rework module_addr_{min,max} module: Use __module_address() for module_address_lookup() module: Make the mod_tree stuff conditional on PERF_EVENTS \|\| TRACING module: Optimize __module_address() using a latched RB-tree rbtree: Implement generic latch_tree seqlock: Introduce raw_read_seqcount_latch() ...	2015-07-01 10:49:25 -07:00
Linus Torvalds	43baed34bc	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull more block layer patches from Jens Axboe: "A few later arrivers that I didn't fold into the first pull request, so we had a chance to run some testing. This contains: - NVMe: - Set of fixes from Keith - 4.4 and earlier gcc build fix from Andrew - small set of xen-blk{back,front} fixes from Bob Liu. - warnings fix for bogus inline statement in I_BDEV() from Geert. - error code fixup for SG_IO ioctl from Paolo Bonzini" * 'for-linus' of git://git.kernel.dk/linux-block: drivers/block/nvme-core.c: fix build with gcc-4.4.4 bdi: Remove "inline" keyword from exported I_BDEV() implementation block: fix bogus EFAULT error from SG_IO ioctl NVMe: Fix filesystem deadlock on removal NVMe: Failed controller initialization fixes NVMe: Unify controller probe and resume NVMe: Don't use fake status on cancelled command NVMe: Fix device cleanup on initialization failure drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising xen/block: add multi-page ring support driver: xen-blkfront: move talk_to_blkback to a more suitable place drivers: xen-blkback: delay pending_req allocation to connect_ring	2015-06-30 19:46:34 -07:00
Ilya Dryomov	5a60e87603	rbd: use GFP_NOIO in rbd_obj_request_create() rbd_obj_request_create() is called on the main I/O path, so we need to use GFP_NOIO to make sure allocation doesn't blow back on us. Not all callers need this, but I'm still hardcoding the flag inside rather than making it a parameter because a) this is going to stable, and b) those callers shouldn't really use rbd_obj_request_create() and will be fixed in the future. More memory allocation fixes will follow. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-07-01 00:46:46 +03:00
Linus Torvalds	88793e5c77	The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVjZGBAAoJEB7SkWpmfYgC4fkP/j+k6HmSRNU/yRYPyo7CAWvj 3P5P1i6R6nMZZbjQrQArAXaIyLlFk4sEQDYsciR6dmslhhFZAkR2eFwVO5rBOyx3 QN0yxEpyjJbroRFUrV/BLaFK4cq2oyJAFFHs0u7/pLHBJ4MDMqfRKAMtlnBxEkTE LFcqXapSlvWitSbjMdIBWKFEvncaiJ2mdsFqT4aZqclBBTj00eWQvEG9WxleJLdv +tj7qR/vGcwOb12X5UrbQXgwtMYos7A6IzhHbqwQL8IrOcJ6YB8NopJUpLDd7ZVq KAzX6ZYMzNueN4uvv6aDfqDRLyVL7qoxM9XIjGF5R8SV9sF2LMspm1FBpfowo1GT h2QMr0ky1nHVT32yspBCpE9zW/mubRIDtXxEmZZ53DIc4N6Dy9jFaNVmhoWtTAqG b9pndFnjUzzieCjX5pCvo2M5U6N0AQwsnq76/CasiWyhSa9DNKOg8MVDRg0rbxb0 UvK0v8JwOCIRcfO3qiKcx+02nKPtjCtHSPqGkFKPySRvAdb+3g6YR26CxTb3VmnF etowLiKU7HHalLvqGFOlDoQG6viWes9Zl+ZeANBOCVa6rL2O7ZnXJtYgXf1wDQee fzgKB78BcDjXH4jHobbp/WBANQGN/GF34lse8yHa7Ym+28uEihDvSD1wyNLnefmo 7PJBbN5M5qP5tD0aO7SZ =VtWG -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm subsystem from Dan Williams: "The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore" * tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits) arch, x86: pmem api for ensuring durability of persistent memory updates libnvdimm: Add sysfs numa_node to NVDIMM devices libnvdimm: Set numa_node to NVDIMM devices acpi: Add acpi_map_pxm_to_online_node() libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only pmem: flag pmem block devices as non-rotational libnvdimm: enable iostat pmem: make_request cleanups libnvdimm, pmem: fix up max_hw_sectors libnvdimm, blk: add support for blk integrity libnvdimm, btt: add support for blk integrity fs/block_dev.c: skip rw_page if bdev has integrity libnvdimm: Non-Volatile Devices tools/testing/nvdimm: libnvdimm unit test infrastructure libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory nd_btt: atomic sector updates libnvdimm: infrastructure for btt devices libnvdimm: write blk label set libnvdimm: write pmem label set libnvdimm: blk labels and namespace instantiation ...	2015-06-29 10:34:42 -07:00
Andrew Morton	e44ac588cd	drivers/block/nvme-core.c: fix build with gcc-4.4.4 gcc-4.4.4 (and possibly other versions) fail the compile when initializers are used with anonymous unions. Work around this. drivers/block/nvme-core.c: In function 'nvme_identify_ctrl': drivers/block/nvme-core.c:1163: error: unknown field 'identify' specified in initializer drivers/block/nvme-core.c:1163: warning: missing braces around initializer drivers/block/nvme-core.c:1163: warning: (near initialization for 'c.<anonymous>') drivers/block/nvme-core.c:1164: error: unknown field 'identify' specified in initializer drivers/block/nvme-core.c:1164: warning: excess elements in struct initializer drivers/block/nvme-core.c:1164: warning: (near initialization for 'c') ... This patch has no effect on text size with gcc-4.8.2. Fixes: `d29ec8241c` ("nvme: submit internal commands through the block layer") Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-06-27 12:20:34 -06:00
Jens Axboe	6443af9855	Merge branch 'stable/for-jens-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2015-06-27 11:47:07 -06:00

1 2 3 4 5 ...

4881 Commits