percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Up to now, applying the in-core activity-log to the on-disk
bitmap did not care for logical_block_size.
On logical_block_size != 512 byte, this very likely results
in misalligned block access and spurious "io errors".
We now simply always submit aligned whole 4k blocks, fixing this
for logical block sizes of 512, 1024, 2048 and 4096.
For even larger logical block sizes, this won't work.
But I'm not aware of devices with such properties being available.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Up to now this only worked for Outdated and Inconsistent disks, that
it did not worked for Consistent disks was an inconsistent omission.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is a race condition that existed for ages.
The previous commit reduces the window, this one closes it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
cmdname() should map command number to its human readable
representation. The string table was incomplete, though.
Maybe rather do a switch() block, and let the compiler help us
to keep it complete?
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Fixes a race and potential kernel panic if e.g. the worker was just
about to send a few P_RS_IS_IN_SYNC via the meta socket for checksum
based resync, while the receiver destroys the sockets in
drbd_disconnect.
To make sure no-one is using the meta socket,
it is not enough to stop the asender...
Grab the meta socket mutex before destroying it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Depending on resync request size,
we need to account for more than one bit.
Impact: cosmetic
If SyncTarget reported correctly 100% equal checksums,
the SyncSource usually reported 12% equal checksums instead,
because it only counted requests, we typically do 32k resync requests,
and the bitmap granularity is still 4k.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Situation:
we have diverging data sets, i.e. we had a split brain somewhen,
but currently are connected, one node diskless.
Then we try to attach that disk, figure it is consistent,
but has a diverging data set, we refuse to attach.
This led to strange state changes:
22:18:35 bb drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> Connected) pdsk( DUnknown -> UpToDate )
22:19:30 bb drbd1: disk( Diskless -> Attaching )
22:19:30 bb drbd1: disk( Attaching -> Negotiating )
22:19:30 bb drbd1: drbd_sync_handshake:
22:19:30 bb drbd1: self 97BF25798B9D5222:F33D1F62ADE698DD:4269796F9D027C83:AC45D8B5C3C1BF93 bits:19449 flags:0
22:19:30 bb drbd1: peer 280DFB6E125465D3:F33D1F62ADE698DC:4269796F9D027C82:AC45D8B5C3C1BF93 bits:2575806 flags:0
22:19:30 bb drbd1: uuid_compare()=100 by rule 90
22:19:30 bb drbd1: Split-Brain detected, dropping connection!
22:19:30 bb drbd1: disk( Negotiating -> Diskless )
while the other side says:
22:19:30 aa drbd1: Split-Brain detected, dropping connection!
22:19:30 aa drbd1: Disk attach process on the peer node was aborted.
22:19:30 aa drbd1: conn( Connected -> TOO_LARGE ) pdsk( Diskless -> Consistent )
This should be fixed now.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
we still don't support 4k 'physical' sectors 'natively',
but use a read-modify-write workaround.
And we even tried to use the extra page before we allocated it :(
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The bm_change semaphore is semantically a mutex. Convert it to a real
mutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Except for SCSI no device drivers distinguish between physical and
hardware segment limits. Consolidate the two into a single segment
limit.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The block layer calling convention is blk_queue_<limit name>.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.
Also introduce a temporary wrapper for backwards compability. This can
be removed after the merge window is closed.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
blk_queue_make_request() internally calls blk_set_default_limits(),
so calling blk_queue_max_segment_size() before is useless.
Ergo: move the call to blk_queue_max_segment_size() down a few lines.
Impact:
If, after a fresh modprobe, you first connect a Diskless drbd,
then attach, this could result in a DRBD Protocol Error at first.
The next connection attempt would then succeeded.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
It is called LBDAF since 2.6.31.
impact:
without this change, on 32bit,
DRBD would wrongly claim to only support 2TiB devices.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Test the just-allocated value for NULL rather than some other value.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,y;
statement S;
@@
x = \(kmalloc\|kcalloc\|kzalloc\)(...);
(
if ((x) == NULL) S
|
if (
- y
+ x
== NULL)
S
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Immediately after changing the write ordering method, the epoch can already
be finished at this point.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
!CONFIG_OPT evalues to FALSE if CONFIG_OPT='m'. Do not display the
"DRBD disabled..." message if the dependencies are compiled as module.
Signed-off-by: Johannes Thoma <johannes.thoma@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
In D_DISKLESS we do not hand out any new references to ldev (local_cnt)
therefore waiting until all previously handed out refereces got returned
is sufficient before actually freeing mdev->ldev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
rsp->count is unsigned so the test does not work.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Since 8.3.3 we fail to do the resync when a partial resynch is not
possible, but a full synch is necessary.
This regression was introduced with 7101539930c0a89146959e7a39c09ad9c3516434
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Otherwise the 'state fixup' in the receiver will change to Unconnected,
but the receiver will terminate itself, and any attempt at 'down'ing
that drbd later will block forever.
see also Bugz. #259
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
this is uncritical, as we still also serialize in userland,
but to correctly serialize on the CONFIG_PENDING bit,
it must be wait_event(state_wait, \!test_and_set_bit)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When there are many blocks on the fly (ua), and the AL gets into "starving"
mode (random IO, scattered all over the device), and the connections gets
interrupted, the receiver thread deadlocks in the drbd_disconnect() code path.
Affected are only nodes in Primary role.
The bug triggers most likely on system that mirror over "long distances"
Regression introduced shortly before 8.3.3
with git commit 31e0f1250f174ac1ee317f360943a0159e19edc8
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When IO gets frozen due to a broken fence-peer script, the user
should be able to thaw IO by the resume-io command.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
To check wether we are truncating a very large device due to limited
meta data space, we need to check the ll_dev size.
Also improve the printk to suggest "flexible" or "internal".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
drbd_int.h uses __ratelimit(), so it needs to #include ratelimit.h:
drivers/block/drbd/drbd_int.h:1765: error: implicit declaration of function '__ratelimit'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Now we have the capabilities of the sending process available,
use them to enforce CAP_SYS_ADMIN.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It is force-included on the gcc command line since at least 2.6.15.
Explicit include lines seem to break compilation now in certain configurations.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>