By aligning n_buckets to power of 2 we can turn the "bucket = hashval %
n_buckets" into a less expensive bucket = hashval & (n_buckets - 1).
This removes the DIV instruction as shown below.
Before:
xor %edx,%edx
divl 0x8(%rbx)
mov %edx,%eax
add $0x1,%rax
shl $0x4,%rax
add %rbx,%rax
After:
lea -0x1(%rdi),%edx
and %edx,%eax
add $0x1,%rax
shl $0x4,%rax
add %rbx,%rax
With a microbenchmark, measuring the time to locate the bucket (i.e.
time_to_calculate_hashval + time_to_calculate_bucket_position) we have
the results below (time in clock cycles):
keylen before after
2-10 79.0 61.9 (-21.65%)
11-17 81.0 64.4 (-20.48%)
18-25 90.0 73.2 (-18.69%)
26-32 104.7 87.0 (-16.82%)
33-40 108.4 89.6 (-17.37%)
41-48 111.2 91.9 (-17.38%)
49-55 120.1 102.1 (-15.04%)
56-63 134.4 115.7 (-13.91%)
As expected the gain is constant, regardless of the key length.
The time to clculate the hashval varies with the key length, which
explains the bigger gains for short keys.
Although the hash table implementation allows passing a callback function
to free a value when it is removed from the hash table, hash_del() wasn't
freeing it if it was provided. Now it does.
As a bonus, it now checks if the callback is set in hash_add() as well.
Use a function to properly get an unsigned short from memory that is
possibly unaligned.
Note that it implicitly fixes a small bug in the hash function that
was introduced when modifying the eina code: the line "hash ^= key[2]
<< 18;" is supposed to be accessing the 3rd byte of the remainder of
the input, but when 'it' was introduced, 'key' ('data' in eina code)
was no longer incremented, so this ended up accessing the 3rd byte of
the input from the beginning. This is fixed by iterating over 'key',
like the eina code does.
Before this patch depmod was failing on ARMv5 and possibly others that
don't have unaligned access. They do not calculate correctly the
dependencies as shown below:
[root@alarm ~]# modinfo bridge
filename: /lib/modules/2.6.39.4/kernel/net/bridge/bridge.ko
version: 2.3
license: GPL
srcversion: 6B583530AE2B39C7E2317BF
depends: stp,llc
vermagic: 2.6.39.4 preempt mod_unload ARMv5
[root@alarm ~]# depmod
[root@alarm ~]# cat /lib/modules/2.6.39.4/modules.dep |grep bridge
kernel/net/bridge/bridge.ko:
[root@alarm ~]#
See how modinfo properly lists the dependencies, but modules.dep which
depmod generates does not contain them. As a result, most kernel
modules fail to load because their dependencies are not loaded by
modprobe.
Lines should not go over 80 chars with a few exceptions:
- headers
- function definitions with only 1 argument
- long strings, otherwise we break grep
This should go later in a coding-style file