Commit Graph

601627 Commits

Author SHA1 Message Date
Linus Torvalds
7e0fb73c52 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  m68k: Add <asm/hash.h>
  <linux/hash.h>: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to <linux/stringhash.h>
2016-05-28 16:15:25 -07:00
George Spelvin
4684fe9530 h8300: Add <asm/hash.h>
This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.

Designing H8-specific hash algorithms to work around that is a separate
project.  (But if the maintainers would like to get in touch...)

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
2016-05-28 15:48:58 -04:00
George Spelvin
7b13277b68 microblaze: Add <asm/hash.h>
Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
2016-05-28 15:48:58 -04:00
George Spelvin
14c44b95b3 m68k: Add <asm/hash.h>
This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647
for the original mc68000, which lacks a 32x32-bit multiply instruction.

Yes, the amount of optimization effort put in is excessive. :-)

Shift-add chain found by Yevgen Voronenko's Hcub algorithm at
http://spiral.ece.cmu.edu/mcm/gen.html

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
2016-05-28 15:48:57 -04:00
George Spelvin
468a942852 <linux/hash.h>: Add support for architecture-specific functions
This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macq.eu>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis <alistai@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
2016-05-28 15:48:31 -04:00
George Spelvin
2a18da7a9c fs/namei.c: Improve dcache hash function
Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.

Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.

There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
   branches or unrolling to distinguish different loop iterations.

One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.

The key insights in this design are:

1) Barring expensive ops like multiplies, to diffuse one input bit
   across 64 bits of hash state takes at least log2(64) = 6 sequentially
   dependent instructions.  That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
   register anyway, and on a 2-operand machine like x86, it's three
   instructions.
3) A better use of a second register is to hold a two-word hash state.
   With careful design, no temporaries are needed at all, so it doesn't
   increase register pressure.  And this gets rid of register copying
   on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requirement for one-round mixing;
   we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
   done in parallel, so on a superscalar processor we get more mixing
   in fewer cycles.

I ended up using a mixing function inspired by the ChaCha and Speck
round functions.  It is 6 simple instructions and 3 cycles per iteration
(assuming multiply by 9 can be done by an "lea" instruction):

		x ^= *input++;
	y ^= x;	x = ROL(x, K1);
	x += y;	y = ROL(y, K2);
	y *= 9;

Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words.  This means that at
least 3 words of input are required to create a collision.

(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)

The rotate constants K1 and K2 were found by experiment.  The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later.  Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.

The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.

The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme.  This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.

(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)

Special thanks to Bruce fields for diligent testing which uncovered a
nasty fencepost error in an earlier version of this patch.

[checkpatch.pl formatting complaints noted and respectfully disagreed with.]

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
2016-05-28 15:45:29 -04:00
George Spelvin
ef703f49a6 Eliminate bad hash multipliers from hash_32() and hash_64()
The "simplified" prime multipliers made very bad hash functions, so get rid
of them.  This completes the work of 689de1d6ca.

To avoid the inefficiency which was the motivation for the "simplified"
multipliers, hash_64() on 32-bit systems is changed to use a different
algorithm.  It makes two calls to hash_32() instead.

drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
for some horrible reason, so it inherits a copy of the old definition.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
2016-05-28 15:42:51 -04:00
George Spelvin
92d567740f Change hash_64() return value to 32 bits
That's all that's ever asked for, and it makes the return
type of hash_long() consistent.

It also allows (upcoming patch) an optimized implementation
of hash_64 on 32-bit machines.

I tried adding a BUILD_BUG_ON to ensure the number of bits requested
was never more than 32 (most callers use a compile-time constant), but
adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
unless tools/perf/MANIFEST is updated, and understanding that code base
well enough to update it is too much trouble.  I did the rest of an
allyesconfig build with such a check, and nothing tripped.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-28 15:42:51 -04:00
George Spelvin
917ea166f4 <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
Finally, the first use of previous two patches: eliminate the
separate ad-hoc string hash functions in the sunrpc code.

Now hash_str() is a wrapper around hash_string(), and hash_mem() is
likewise a wrapper around full_name_hash().

Note that sunrpc code *does* call hash_mem() with a zero length, which
is why the previous patch needed to handle that in full_name_hash().
(Thanks, Bruce, for finding that!)

This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.

The comment about the quality of hashlen_string() and full_name_hash()
is jumping the gun by a few patches; they aren't very impressive now,
but will be improved greatly later in the series.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
2016-05-28 15:42:50 -04:00
George Spelvin
fcfd2fbf22 fs/namei.c: Add hashlen_string() function
We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.

(The name is to emphasize that it returns both hash and length.)

It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.

full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
   be consistent with hash_name().
2) Handle zero-length inputs.  If we want more callers, we don't want
   to make them worry about corner cases.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-28 15:42:50 -04:00
George Spelvin
f4bcbe792b Pull out string hash to <linux/stringhash.h>
... so they can be used without the rest of <linux/dcache.h>

The hashlen_* macros will make sense next patch.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-28 15:42:40 -04:00
Linus Torvalds
4e8440b3b6 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "A fix for a regression introduced yesterday.

  The regression didn't show up here locally because I did not have
  PAGE_POISONING enabled.  And buildbots discovered this only after it
  hit your tree.  Thanks to Dan for the quick response"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: dev: use after free in detach
2016-05-28 12:38:50 -07:00
Linus Torvalds
a1842b2b6f platform/chrome: Driver and binding changes for 4.7
A handful of changes this merge window:
 
  - A few patches to fix probing and configuration of pstore
  - A few patches adding Elan touchpad registration on a few devices
  - EC changes: a security fix dealing with max message sizes and addition
    of compat_ioctl support.
  - Keyboard backlight control support
 
 There was also an accidential duplicate registration of trackpads on 'Leon',
 which was reverted just recently.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXSb5tAAoJEIwa5zzehBx38BsQAI+p6urzboC2+NSsZAq72d4/
 u2SF+ziEdgj4f7n38TjiDaFTqig9nqoQeB2nDCRicyEcHsCiK/ghsdq4ULfkvHIa
 VALn/rBkQWQ7hC+2tfrQn7Rd2VqVnh28+EyhGezhVtrAGRaE2jIbgKnnfD3MS+b5
 lab014HBppP1pDtJQQYCcoCpOUF1a9WEb8jKPWf4DCeCMaSrwFLKxV5VeqDkNtEZ
 a3GFzB+WypN1nwBQ7PEOUxutzsUakSNy+RFoCqGwVppJ+kA/E3MO2BGoEJuQ6LXP
 u7nyabJq6zBQAlrYie4OoiijBWQ+Q759jbOKkmxpBfL/mtMMepV05DIx/16Yi9NY
 j+4A9HsuY+EYUhr0mH7B57CH0RClebixdZ62DtUK0Fe6H66SsMZPYtTKLIzOTzgO
 j+gypQyW8BvcM/XjPYTA/vlYK45IJYB5OUDQzFW0qWQv/nInoYOS3G8IVooKHMDy
 +s2wih/OAxBCDJ+KfDeJUj3U5Avz1YCojRraZWmBsOPpRpsQHVQ02r3uoHeHHZFv
 bI8sTJ7BGxoYX5Kb6+xbtW7SnXoWZN8vlqdCpdNnLWKcwIqHKFirkoZh2N2AqyZB
 7VpQW1HD2WxYBpxMv98KVyPiR+5xCnZnpqcIIvpGrX00KhmJkMgIT4FbWHMf6vVt
 T55Sz5V9z/dsgMSDT3LY
 =bcc4
 -----END PGP SIGNATURE-----

Merge tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform

Pull chrome platform updates from Olof Johansson
 "A handful of Chrome driver and binding changes this merge window:

   - a few patches to fix probing and configuration of pstore

   - a few patches adding Elan touchpad registration on a few devices

   - EC changes: a security fix dealing with max message sizes and
     addition of compat_ioctl support.

   - keyboard backlight control support

  There was also an accidential duplicate registration of trackpads on
  'Leon', which was reverted just recently"

* tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
  Revert "platform/chrome: chromeos_laptop: Add Leon Touch"
  platform/chrome: chromeos_laptop - Add Elan touchpad for Wolf
  platform/chrome: chromeos_laptop - Add elan trackpad option for C720
  platform/chrome: cros_ec_dev - Populate compat_ioctl
  platform/chrome: cros_ec_lightbar - use name instead of ID to hide lightbar attributes
  platform/chrome: cros_ec_dev - Fix security issue
  platform/chrome: Add Chrome OS keyboard backlight LEDs support
  platform/chrome: use to_platform_device()
  platform/chrome: pstore: Move to larger record size.
  platform/chrome: pstore: probe for ramoops buffer using acpi
  platform/chrome: chromeos_laptop: Add Leon Touch
2016-05-28 12:32:01 -07:00
Linus Torvalds
0723ab4a97 sound updates #2 for 4.7-rc1
This is the second update round for 4.7-rc1.  Most of changes are
 about the pending ASoC updates and fixes, including a few new
 drivers.  Below are some highlights:
 
 ASoC:
 - New drivers for MAX98371 and TAS5720
 - SPI support for TLV320AIC32x4, along with the module split
 - TDM support for STI Uniperf IPs
 - Remaining topology API fixes / updates
 
 HDA:
 - A couple of Dell quirks and new Realtek codec support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXSLGwAAoJEGwxgFQ9KSmkKz4P/2xpAXcwjT/g/WgeNVsZLnxd
 vs1KlMPSWXLHY7ESZB+oDYtw5pAQWta2gKnG3T0QpkEtyqcyvEAUch55SfPbkDWz
 bRwboK91NF9Cfrso+QnUG1HdpaeDsNydiAR5u2sdemQG+rh8TmWXNmFsuqPptjbm
 LP6Spf8Ia2TYAvagZOB+2UTl7Jq8jMXiYP3aGWMHm7P/kREMQkSWcQ9U8F8UK92G
 5D0qKGvChsd23ybGUL1nBM7wBvErFoKd4Xa1zMudQt8EkTjistdgm24v3PO+lKDv
 JYiEugOZctzqtQVUlQMXcIqrlsafXwJN7ttKGst9gj32bM+a7EW0TGG0KyhxXI5w
 fRgGU7AJwncs9hBzEPBfc6Jms85THN2HpusU61ZYpyFAhLnHAOL7iIZnNKY8Pyyg
 tOPY2lTwHD9ic9EiC33/IypT50n0lBOi7X+YE7lGWdm2jXNvxtFv1jUw99kx25fj
 UaFNQaDYXXDKO1POCFrHpIq+jJ71Jmk7mXktI75wfuLyX3PSPyFg8OBbYVbTWkbL
 xdSqBs6LCESZ1iV9mauxwPSex44BpaMB3E0TA+7iN3+0Uwdfxe0OoLnX6dGiLSZJ
 QenFu/EDdCsA8rlrDy7AS1e5ulpYsTY1KGSZbfNzMdNsmD2XC3FdZHEF5PAC4wZQ
 EnNaik6InJgHlMF/6MXo
 =uNKQ
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull more sound updates from Takashi Iwai:
 "This is the second update round for 4.7-rc1.  Most of changes are
  about the pending ASoC updates and fixes, including a few new drivers.
  Below are some highlights:

  ASoC:
   - New drivers for MAX98371 and TAS5720
   - SPI support for TLV320AIC32x4, along with the module split
   - TDM support for STI Uniperf IPs
   - Remaining topology API fixes / updates

  HDA:
   - A couple of Dell quirks and new Realtek codec support"

* tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits)
  ALSA: hda - Fix headset mic detection problem for one Dell machine
  spi: spi-ep93xx: Fix the PTR_ERR() argument
  ALSA: hda/realtek - Add support for ALC295/ALC3254
  ASoC: kirkwood: fix build failure
  ALSA: hda - Fix headphone noise on Dell XPS 13 9360
  ASoC: ak4642: Enable cache usage to fix crashes on resume
  ASoC: twl6040: Disconnect AUX output pads on digital mute
  ASoC: tlv320aic32x4: Properly implement the positive and negative pins into the mixers
  rcar: src: skip disabled-SRC nodes
  ASoC: max98371 Remove duplicate entry in max98371_reg
  ASoC: twl6040: Select LPPLL during standby
  ASoC: rsnd: don't use prohibited number to PDMACHCRn.SRS
  ASoC: simple-card: Add pm callbacks to platform driver
  ASoC: pxa: Fix module autoload for platform drivers
  ASoC: topology: Fix memory leak in widget creation
  ASoC: Add max98371 codec driver
  ASoC: rsnd: count .probe/.remove for rsnd_mod_call()
  ASoC: topology: Check size mismatch of ABI objects before parsing
  ASoC: topology: Check failure to create a widget
  ASoC: add support for TAS5720 digital amplifier
  ...
2016-05-28 12:23:12 -07:00
Linus Torvalds
9ba55cf7cf Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the outstanding target pending updates for v4.7-rc1.

  The highlights this round include:

   - Allow external PR/ALUA metadata path be defined at runtime via top
     level configfs attribute (Lee)
   - Fix target session shutdown bug for ib_srpt multi-channel (hch)
   - Make TFO close_session() and shutdown_session() optional (hch)
   - Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref
     (hch)
   - Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence)
   - Refactor iscsi-target RX/TX PDU encode/decode into common code
     (Varun)
   - Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu,
     validate_parameters, and get_r2t_ttt for generic ISO offload
     (Varun)
   - Initial merge of cxgb iscsi-segment offload target driver (Varun)

  The bulk of the changes are Chelsio's new driver, along with a number
  of iscsi-target common code improvements made by Varun + Co along the
  way"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits)
  iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
  cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute
  iscsi-target: Convert transport drivers to signal rdma_shutdown
  iscsi-target: Make iscsi_tpg_np driver show/store use generic code
  tcm_qla2xxx Add SCSI command jammer/discard capability
  iscsi-target: graceful disconnect on invalid mapping to iovec
  target: need_to_release is always false, remove redundant check and kfree
  target: remove sess_kref and ->shutdown_session
  iscsi-target: remove usage of ->shutdown_session
  tcm_qla2xxx: introduce a private sess_kref
  target: make close_session optional
  target: make ->shutdown_session optional
  target: remove acl_stop
  target: consolidate and fix session shutdown
  cxgbit: add files for cxgbit.ko
  iscsi-target: export symbols
  iscsi-target: call complete on conn_logout_comp
  iscsi-target: clear tx_thread_active
  iscsi-target: add new offload transport type
  iscsi-target: use conn_transport->transport_type in text rsp
  ...
2016-05-28 12:04:17 -07:00
Linus Torvalds
1cbe06c3cf Round two of 4.7 merge window patches
- Dynamic counter infrastructure in the IB drivers
   This is a sysfs based code to allow free form access to the hardware
   counters RDMA devices might support so drivers don't need to code
   this up repeatedly themselves
 - SendOnlyFullMember multicast support
 - IB router support
 - A couple misc fixes
 - The big item on the list: hfi1 driver updates, plus moving the hfi1
   driver out of staging
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXRyupAAoJELgmozMOVy/deHAQAJp87yn8hntHXh81Kjrvezch
 McJ0v7FTsfsLVe/Y+YAbaAREjanhnUDzZ36WUqS9rHadOl+Ia0fD+M53aTk9hDKS
 R9/oWQDsUb8k7imdhNrSS72Q2kbwr++8RqBoEEFcl4fuDCVsrRhUaf+perJTEnoK
 I9APDQCNU3FCTPFRllK5al6gi5UGEW9vO9JQZb+VnTVW9+LjlYobzcZfttAw8oRN
 rtQLP3xGb7ZwV0ngH0dt2TthN2ih97TavnISojTdJ5I+WHFxLV0YbHSK/E2JLdoi
 hVknhrflPIKr+mRVv97yCCz92kGAdTKG3sT9vNUVUQ0Y8kEdrW/LrnrmxsBEYzkl
 GcQeqg7//ffOlQ0dGZZjlkPg/8yMPo9CBsfPDcOvXeenK0lkao9qg2yrIz+CBC/9
 YlpUKfL0JLRal9O6y3Mh++ZQ9R1kvQN8/CgHLUlRQdWLfxf7VPA4rIcaWT7iGfvg
 z6jIJKjsK+hqKhY9oKh3YfNmXh65IQ5OF3+tBzHKwNiGXNSsMYAVYP3vc/HYTJvz
 4EpsenndKb5l0AOvDch8rMmw7YZPf2NOdi90TW6Bq1VgTIDRxh0WzR2QTnWBVRSf
 XAcC3DLXTHqC/UUpHrdXgfvU/0nQXaVUvPWIAM/t3hy91GLscByDw4973+tISs/B
 7AHSptbVQaBUpTjWHfKf
 =5+cf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "This is the second group of code for the 4.7 merge window.  It looks
  large, but only in one sense.  I'll get to that in a minute.  The list
  of changes here breaks down as follows:

   - Dynamic counter infrastructure in the IB drivers

     This is a sysfs based code to allow free form access to the
     hardware counters RDMA devices might support so drivers don't need
     to code this up repeatedly themselves

   - SendOnlyFullMember multicast support

   - IB router support

   - A couple misc fixes

   - The big item on the list: hfi1 driver updates, plus moving the hfi1
     driver out of staging

  There was a group of 15 patches in the hfi1 list that I thought I had
  in the first pull request but they weren't.  So that added to the
  length of the hfi1 section here.

  As far as these go, everything but the hfi1 is pretty straight
  forward.

  The hfi1 is, if you recall, the driver that Al had complaints about
  how it used the write/writev interfaces in an overloaded fashion.  The
  write portion of their interface behaved like the write handler in the
  IB stack proper and did bi-directional communications.  The writev
  interface, on the other hand, only accepts SDMA request structures.
  The completions for those structures are sent back via an entirely
  different event mechanism.

  With the security patch, we put security checks on the write
  interface, however, we also knew they would be going away soon.  Now,
  we've converted the write handler in the hfi1 driver to use ioctls
  from the IB reserved magic area for its bidirectional communications.
  With that change, Intel has addressed all of the items originally on
  their TODO when they went into staging (as well as many items added to
  the list later).

  As such, I moved them out, and since they were the last item in the
  staging/rdma directory, and I don't have immediate plans to use the
  staging area again, I removed the staging/rdma area.

  Because of the move out of staging, as well as a series of 5 patches
  in the hfi1 driver that removed code people thought should be done in
  a different way and was optional to begin with (a snoop debug
  interface, an eeprom driver for an eeprom connected directory to their
  hfi1 chip and not via an i2c bus, and a few other things like that),
  the line count, especially the removal count, is high"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits)
  staging/rdma: Remove the entire rdma subdirectory of staging
  IB/core: Make device counter infrastructure dynamic
  IB/hfi1: Fix pio map initialization
  IB/hfi1: Correct 8051 link parameter settings
  IB/hfi1: Update pkey table properly after link down or FM start
  IB/rdamvt: Fix rdmavt s_ack_queue sizing
  IB/rdmavt: Max atomic value should be a u8
  IB/hfi1: Fix hard lockup due to not using save/restore spin lock
  IB/hfi1: Add tracing support for send with invalidate opcode
  IB/hfi1, qib: Add ieth to the packet header definitions
  IB/hfi1: Move driver out of staging
  IB/hfi1: Do not free hfi1 cdev parent structure early
  IB/hfi1: Add trace message in user IOCTL handling
  IB/hfi1: Remove write(), use ioctl() for user cmds
  IB/hfi1: Add ioctl() interface for user commands
  IB/hfi1: Remove unused user command
  IB/hfi1: Remove snoop/diag interface
  IB/hfi1: Remove EPROM functionality from data device
  IB/hfi1: Remove UI char device
  IB/hfi1: Remove multiple device cdev
  ...
2016-05-28 11:04:16 -07:00
Benson Leung
8d057e3a18 Revert "platform/chrome: chromeos_laptop: Add Leon Touch"
This reverts commit bff3c624dc.

Board "Leon" is otherwise known as "Toshiba CB35" and we already have
the entry that supports that board as of this commit :
963cb6f platform/chrome: chromeos_laptop - Add Toshiba CB35 Touch

Remove this duplicate.

Signed-off-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2016-05-28 08:47:48 -07:00
Dan Carpenter
e6be18f6d6 i2c: dev: use after free in detach
The call to put_i2c_dev() frees "i2c_dev" so there is a use after
free when we call cdev_del(&i2c_dev->cdev).

Fixes: d6760b14d4 ('i2c: dev: switch from register_chrdev to cdev API')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-05-28 17:37:42 +02:00
Linus Torvalds
ed2608faa0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull more input subsystem updates from Dmitry Torokhov:
 "Just a few more driver fixes; new drivers will be coming in the next
  merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: pwm-beeper - fix - scheduling while atomic
  Input: xpad - xbox one elite controller support
  Input: xpad - add more third-party controllers
  Input: xpad - prevent spurious input from wired Xbox 360 controllers
  Input: xpad - move pending clear to the correct location
  Input: uinput - handle compat ioctl for UI_SET_PHYS
2016-05-27 19:14:35 -07:00
Linus Torvalds
06d2e7812e Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
 "Here is the second pull request from I2C for this merge window:

   - one new feature (which nearly fell through the cracks): i2c-dev
     does now use the cdev API so it can handle >256 minors.  Seems
     people do need that.

   - two fixes for the just added DMA feature for i2c-rcar

   - some typo fixes"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: dev: don't start function name with 'return'
  i2c: dev: switch from register_chrdev to cdev API
  i2c: xlr: rename ARCH_TANGOX to ARCH_TANGO
  i2c: at91: change log when dma configuration fails
  misc: at24: Fix typo in at24 header file
  i2c: rcar: should depend on HAS_DMA
  i2c: rcar: use dma_request_chan()
2016-05-27 19:07:10 -07:00
Linus Torvalds
7d8eb50290 Merge branch 'for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "This contains a nice FPU fixup from Eli Cooper for UML"

* 'for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: add extended processor state save/restore support
  um: extend fpstate to _xstate to support YMM registers
  um: fix FPU state preservation around signal handlers
2016-05-27 18:54:59 -07:00
Linus Torvalds
23a3e178b9 This pull request contains mostly cleanups and minor
improvements of UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXSJl6AAoJEEtJtSqsAOnWbbAP/2ls5KpGsRSoOP4NsKo5+8k9
 GsZ8hb0iN+UGDxMB5Jlj6O1ISNPz/8o7iGBuWa5OWFdh4Nn38sA1Qv996Rg3Ca3O
 8KYEGAD7POWANfxCTmyQX/L8AsOP62B3diFktPyGrtYmLsVzx/AFN9/nM27ticdF
 PQpUtLwvL/m7/oE6ymFCB4x34+DkTa7Jx4e+chVlF61CUipGc5I60VVX1Do+/DWt
 S37ajjIxMJx0BO7Zs6vrcH9uGFSbvSpVBr9/sC16t0Fa1/vAa9pXieg3y5XZmtOl
 +6NlwUxXVu5IjHkYYGZP8jsrA7bKJlflgCo/w6WKC6V0KECnBx+oSG9uMjDVPBSM
 rc4VF4nevnNLNqFGBRWhxiYrRUCdB1TcWVDOzM16peNeUNJiWj/+4PbsiJQBwECH
 ZnE/dBrjU7v+J8UmHsUJSveW6/5PluJf1loDzrip7gk5QD13wdpPo3VOycOj1vMD
 iS6H/TBjtcEzWWh7qcBbtgkbHQH0g+w9YSzVy7ODFweuoq+4qCZ1davqzzZaRv3q
 hylor1gea3b4GnH1EDjmXyQOOuIf2rLdJhou8WoZhmy2q3FrnE+89/CCOCMkeQMQ
 qQr1lAbxD7A+8bYwSZ7Nvv6lrBUrat59DV6FHFc69MdkGT7J6z85ZOyd3lxRvJc/
 XiZZkKhrYSKgV/iLFhHW
 =L16y
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains mostly cleanups and minor improvements of UBI and UBIFS"

* tag 'upstream-4.7-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: ubifs_dump_inode: Fix dumping field bulk_read
  UBI: Fix static volume checks when Fastmap is used
  UBI: Set free_count to zero before walking through erase list
  UBI: Silence an unintialized variable warning
  UBI: Clean up return in ubi_remove_volume()
  UBI: Modify wrong comment in ubi_leb_map function.
  UBI: Don't read back all data in ubi_eba_copy_leb()
  UBI: Add ro-mode sysfs attribute
2016-05-27 18:49:29 -07:00
Linus Torvalds
e0714ec4f9 nfs: fix anonymous member initializer build failure with older compilers
Older versions of gcc don't understand named initializers inside a
anonymous structure or union member.  It can be worked around by adding
the bracin gin the initializer for the anonymous member.

Without this, gcc 4.4.4 will fail the build with

    CC      fs/nfs/nfs4state.o
  fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer
  fs/nfs/nfs4state.c:69: warning: missing braces around initializer
  fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid.<anonymous>.data’)
  make[2]: *** [fs/nfs/nfs4state.o] Error 1

introduced in commit 93b717fd81 ("NFSv4: Label stateids with the type")

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 17:20:27 -07:00
Linus Torvalds
d102a56edb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Followups to the parallel lookup work:

   - update docs

   - restore killability of the places that used to take ->i_mutex
     killably now that we have down_write_killable() merged

   - Additionally, it turns out that I missed a prerequisite for
     security_d_instantiate() stuff - ->getxattr() wasn't the only thing
     that could be called before dentry is attached to inode; with smack
     we needed the same treatment applied to ->setxattr() as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch ->setxattr() to passing dentry and inode separately
  switch xattr_handler->set() to passing dentry and inode separately
  restore killability of old mutex_lock_killable(&inode->i_mutex) users
  add down_write_killable_nested()
  update D/f/directory-locking
2016-05-27 17:14:05 -07:00
Al Viro
3767e255b3 switch ->setxattr() to passing dentry and inode separately
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.

Similar change for ->getxattr() had been done in commit ce23e64.  Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.

Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-27 20:09:16 -04:00
Linus Torvalds
0121a32201 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
 "The meat of this is a change to use the mounter's credentials for
  operations that require elevated privileges (such as whiteout
  creation).  This fixes behavior under user namespaces as well as being
  a nice cleanup"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: Do d_type check only if work dir creation was successful
  ovl: update documentation
  ovl: override creds with the ones from the superblock mounter
2016-05-27 16:44:39 -07:00
Manfred Schlaegl
f49cf3b8b4 Input: pwm-beeper - fix - scheduling while atomic
Pwm config may sleep so defer it using a worker.

On a Freescale i.MX53 based board we ran into "BUG: scheduling while
atomic" because input_inject_event locks interrupts, but
imx_pwm_config_v2 sleeps.

Tested on Freescale i.MX53 SoC with 4.6.0.

Signed-off-by: Manfred Schlaegl <manfred.schlaegl@gmx.at>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-05-27 16:40:30 -07:00
Linus Torvalds
559b6d90a0 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs cleanups and fixes from Chris Mason:
 "We have another round of fixes and a few cleanups.

  I have a fix for short returns from btrfs_copy_from_user, which
  finally nails down a very hard to find regression we added in v4.6.

  Dave is pushing around gfp parameters, mostly to cleanup internal apis
  and make it a little more consistent.

  The rest are smaller fixes, and one speelling fixup patch"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits)
  Btrfs: fix handling of faults from btrfs_copy_from_user
  btrfs: fix string and comment grammatical issues and typos
  btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
  Btrfs: fix unexpected return value of fiemap
  Btrfs: free sys_array eb as soon as possible
  btrfs: sink gfp parameter to convert_extent_bit
  btrfs: make state preallocation more speculative in __set_extent_bit
  btrfs: untangle gotos a bit in convert_extent_bit
  btrfs: untangle gotos a bit in __clear_extent_bit
  btrfs: untangle gotos a bit in __set_extent_bit
  btrfs: sink gfp parameter to set_record_extent_bits
  btrfs: sink gfp parameter to set_extent_new
  btrfs: sink gfp parameter to set_extent_defrag
  btrfs: sink gfp parameter to set_extent_delalloc
  btrfs: sink gfp parameter to clear_extent_dirty
  btrfs: sink gfp parameter to clear_record_extent_bits
  btrfs: sink gfp parameter to clear_extent_bits
  btrfs: sink gfp parameter to set_extent_bits
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-27 16:37:36 -07:00
Pavel Rojtberg
6f49a398b2 Input: xpad - xbox one elite controller support
added the according id and incresed XPAD_PKT_LEN to 64 as the elite
controller sends at least 33 byte messages [1].
Verified to be working by [2].

[1]: https://franticrain.github.io/sniffs/XboxOneSniff.html
[2]: https://github.com/paroj/xpad/issues/23

Signed-off-by: Pierre-Loup A. Griffais <eduke32@plagman.net>
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-05-27 16:32:49 -07:00
Pavel Rojtberg
6538c3b2d2 Input: xpad - add more third-party controllers
Signed-off-by: Pierre-Loup A. Griffais <eduke32@plagman.net>
Signed-off-by: Thomas Debesse <dev@illwieckz.net>
Signed-off-by: aronschatz <aronschatz@aselabs.com>
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-05-27 16:32:48 -07:00
Cameron Gutman
1ff5fa3c67 Input: xpad - prevent spurious input from wired Xbox 360 controllers
After initially connecting a wired Xbox 360 controller or sending it
a command to change LEDs, a status/response packet is interpreted as
controller input. This causes the state of buttons represented in
byte 2 of the controller data packet to be incorrect until the next
valid input packet. Wireless Xbox 360 controllers are not affected.

Writing a new value to the LED device while holding the Start button
and running jstest is sufficient to reproduce this bug. An event will
come through with the Start button released.

Xboxdrv also won't attempt to read controller input from a packet
where byte 0 is non-zero. It also checks that byte 1 is 0x14, but
that value differs between wired and wireless controllers and this
code is shared by both. I think just checking byte 0 is enough to
eliminate unwanted packets.

The following are some examples of 3-byte status packets I saw:
01 03 02
02 03 00
03 03 03
08 03 00

Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-05-27 16:32:47 -07:00
Pavel Rojtberg
4efc6939a8 Input: xpad - move pending clear to the correct location
otherwise we lose ff commands: https://github.com/paroj/xpad/issues/27

Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-05-27 16:32:46 -07:00
Linus Torvalds
aa00edc128 make IS_ERR_VALUE() complain about non-pointer-sized arguments
Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses
on integers, add a cast to a pointer and back to the argument, so that
any new mis-uses of IS_ERR_VALUE() will cause warnings like

   warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

so that we don't re-introduce any bogus uses.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 16:03:22 -07:00
Linus Torvalds
5d22fc25d4 mm: remove more IS_ERR_VALUE abuses
The do_brk() and vm_brk() return value was "unsigned long" and returned
the starting address on success, and an error value on failure.  The
reasons are entirely historical, and go back to it basically behaving
like the mmap() interface does.

However, nobody actually wanted that interface, and it causes totally
pointless IS_ERR_VALUE() confusion.

What every single caller actually wants is just the simpler integer
return of zero for success and negative error number on failure.

So just convert to that much clearer and more common calling convention,
and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:57:31 -07:00
Arnd Bergmann
287980e49f remove lots of IS_ERR_VALUE abuses
Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

Andrzej Hajda has already fixed a lot of the worst abusers that
were causing actual bugs, but it would be nice to prevent any
users that are not passing 'unsigned long' arguments.

This patch changes all users of IS_ERR_VALUE() that I could find
on 32-bit ARM randconfig builds and x86 allmodconfig. For the
moment, this doesn't change the definition of IS_ERR_VALUE()
because there are probably still architecture specific users
elsewhere.

Almost all the warnings I got are for files that are better off
using 'if (err)' or 'if (err < 0)'.
The only legitimate user I could find that we get a warning for
is the (32-bit only) freescale fman driver, so I did not remove
the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
For 9pfs, I just worked around one user whose calling conventions
are so obscure that I did not dare change the behavior.

I was using this definition for testing:

 #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
       unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))

which ends up making all 16-bit or wider types work correctly with
the most plausible interpretation of what IS_ERR_VALUE() was supposed
to return according to its users, but also causes a compile-time
warning for any users that do not pass an 'unsigned long' argument.

I suggested this approach earlier this year, but back then we ended
up deciding to just fix the users that are obviously broken. After
the initial warning that caused me to get involved in the discussion
(fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
asked me to send the whole thing again.

[ Updated the 9p parts as per Al Viro  - Linus ]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.org/lkml/2016/1/7/363
Link: https://lkml.org/lkml/2016/5/27/486
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:26:11 -07:00
Linus Torvalds
7ded384a12 mm: fix section mismatch warning
The register_page_bootmem_info_node() function needs to be marked __init
in order to avoid a new warning introduced by commit f65e91df25 ("mm:
use early_pfn_to_nid in register_page_bootmem_info_node").

Otherwise you'll get a warning about how a non-init function calls
early_pfn_to_nid (which is __meminit)

Cc: Yang Shi <yang.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:23:32 -07:00
Linus Torvalds
af7d93729c Merge branch 'akpm' (patches from Andrew)
Merge misc updates and fixes from Andrew Morton:

 - late-breaking ocfs2 updates

 - random bunch of fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM
  mm/memcontrol.c: move comments for get_mctgt_type() to proper position
  mm/memcontrol.c: fix the margin computation in mem_cgroup_margin()
  mm/cma: silence warnings due to max() usage
  mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()
  oom_reaper: close race with exiting task
  mm: use early_pfn_to_nid in register_page_bootmem_info_node
  mm: use early_pfn_to_nid in page_ext_init
  MAINTAINERS: Kdump maintainers update
  MAINTAINERS: add kexec_core.c and kexec_file.c
  mm: oom: do not reap task if there are live threads in threadgroup
  direct-io: fix direct write stale data exposure from concurrent buffered read
  ocfs2: bump up o2cb network protocol version
  ocfs2: o2hb: fix hb hung time
  ocfs2: o2hb: don't negotiate if last hb fail
  ocfs2: o2hb: add some user/debug log
  ocfs2: o2hb: add NEGOTIATE_APPROVE message
  ocfs2: o2hb: add NEGO_TIMEOUT message
  ocfs2: o2hb: add negotiate timer
2016-05-27 14:56:59 -07:00
Gavin Shan
11e685672a mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM
When we have !NO_BOOTMEM, the deferred page struct initialization
doesn't work well because the pages reserved in bootmem are released to
the page allocator uncoditionally.  It causes memory corruption and
system crash eventually.

As Mel suggested, the bootmem is retiring slowly.  We fix the issue by
simply hiding DEFERRED_STRUCT_PAGE_INIT when bootmem is enabled.

Link: http://lkml.kernel.org/r/1460602170-5821-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Li RongQing
7cf7806ce1 mm/memcontrol.c: move comments for get_mctgt_type() to proper position
Move the comments for get_mctgt_type() to be before get_mctgt_type()
implementation.

Link: http://lkml.kernel.org/r/1463644638-7446-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Li RongQing
cbedbac3e6 mm/memcontrol.c: fix the margin computation in mem_cgroup_margin()
mem_cgroup_margin() might return (memory.limit - memory_count) when the
memsw.limit is in excess.  This doesn't happen usually because we do not
allow excess on hard limits and (memory.limit <= memsw.limit), but
__GFP_NOFAIL charges can force the charge and cause the excess when no
memory is really swappable (swap is full or no anonymous memory is
left).

[mhocko@suse.com: rewrote changelog]
  Link: http://lkml.kernel.org/r/20160525155122.GK20132@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1464068266-27736-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Stephen Rothwell
badbda53e5 mm/cma: silence warnings due to max() usage
pageblock_order can be (at least) an unsigned int or an unsigned long
depending on the kernel config and architecture, so use max_t(unsigned
long, ...) when comparing it.

fixes these warnings:

In file included from include/asm-generic/bug.h:13:0,
                 from arch/powerpc/include/asm/bug.h:127,
                 from include/linux/bug.h:4,
                 from include/linux/mmdebug.h:4,
                 from include/linux/mm.h:8,
                 from include/linux/memblock.h:18,
                 from mm/cma.c:28:
mm/cma.c: In function 'cma_init_reserved_mem':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
  (void) (&_max1 == &_max2);                   ^
mm/cma.c:186:27: note: in expansion of macro 'max'
  alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
                           ^
mm/cma.c: In function 'cma_declare_contiguous':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
  (void) (&_max1 == &_max2);                   ^
include/linux/kernel.h:747:9: note: in definition of macro 'max'
  typeof(y) _max2 = (y);            ^
mm/cma.c:270:29: note: in expansion of macro 'max'
   (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
                             ^
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
  (void) (&_max1 == &_max2);                   ^
include/linux/kernel.h:747:21: note: in definition of macro 'max'
  typeof(y) _max2 = (y);                        ^
mm/cma.c:270:29: note: in expansion of macro 'max'
   (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
                             ^

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20160526150748.5be38a4f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Kirill A. Shutemov
0798d3c022 mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()
If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
page from a pte, the "address" must be THP aligned in order for the
page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.

Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com
Fixes: 6d0a07edd1 ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>        [4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Michal Hocko
e2fe14564d oom_reaper: close race with exiting task
Tetsuo has reported:
  Out of memory: Kill process 443 (oleg's-test) score 855 or sacrifice child
  Killed process 443 (oleg's-test) total-vm:493248kB, anon-rss:423880kB, file-rss:4kB, shmem-rss:0kB
  sh invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0
  sh cpuset=/ mems_allowed=0
  CPU: 2 PID: 1 Comm: sh Not tainted 4.6.0-rc7+ #51
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
  Call Trace:
    dump_stack+0x85/0xc8
    dump_header+0x5b/0x394
  oom_reaper: reaped process 443 (oleg's-test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

In other words:

  __oom_reap_task		exit_mm
    atomic_inc_not_zero
				  tsk->mm = NULL
				  mmput
				    atomic_dec_and_test # > 0
				  exit_oom_victim # New victim will be
						  # selected
				<OOM killer invoked>
				  # no TIF_MEMDIE task so we can select a new one
    unmap_page_range # to release the memory

The race exists even without the oom_reaper because anybody who pins the
address space and gets preempted might race with exit_mm but oom_reaper
made this race more probable.

We can address the oom_reaper part by using oom_lock for __oom_reap_task
because this would guarantee that a new oom victim will not be selected
if the oom reaper might race with the exit path.  This doesn't solve the
original issue, though, because somebody else still might be pinning
mm_users and so __mmput won't be called to release the memory but that
is not really realiably solvable because the task will get away from the
oom sight as soon as it is unhashed from the task_list and so we cannot
guarantee a new victim won't be selected.

[akpm@linux-foundation.org: fix use of unused `mm', Per Stephen]
[akpm@linux-foundation.org: coding-style fixes]
Fixes: aac4536355 ("mm, oom: introduce oom reaper")
Link: http://lkml.kernel.org/r/1464271493-20008-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Yang Shi
f65e91df25 mm: use early_pfn_to_nid in register_page_bootmem_info_node
register_page_bootmem_info_node() is invoked in mem_init(), so it will
be called before page_alloc_init_late() if DEFERRED_STRUCT_PAGE_INIT is
enabled.  But, pfn_to_nid() depends on memmap which won't be fully setup
until page_alloc_init_late() is done, so replace pfn_to_nid() by
early_pfn_to_nid().

Link: http://lkml.kernel.org/r/1464210007-30930-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Yang Shi
fe53ca5427 mm: use early_pfn_to_nid in page_ext_init
page_ext_init() checks suitable pages with pfn_to_nid(), but
pfn_to_nid() depends on memmap which will not be setup fully until
page_alloc_init_late() is done.  Use early_pfn_to_nid() instead of
pfn_to_nid() so that page extension could be still used early even
though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
allocation call sites.

Suggested by Joonsoo Kim [1], this fix basically undoes the change
introduced by commit b8f1a75d61 ("mm: call page_ext_init() after all
struct pages are initialized") and fixes the same problem with a better
approach.

[1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Vivek Goyal
f871f19135 MAINTAINERS: Kdump maintainers update
I am proposing following updates to kdump maintainership.  I have got
busy in other things and not getting time to spend on kdump.

Remove Haren Myneni as he has not participated in kdump development for
a long time now.

Add the names of Dave and Baoquan as kdump maintainers as they have been
contributing to kdump for a long time now and they are in a much better
position to spend time on this than me.

Mark myself as a reviewer.

Link: http://lkml.kernel.org/r/20160525131616.GB27291@redhat.com
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Minfei Huang
10540a6998 MAINTAINERS: add kexec_core.c and kexec_file.c
In the below commits kexec.c was split to kexec.c, kexec_file.c and
kexec_core.c.

commit a43cac0d9d ("kexec: split kexec_file syscall code to kexec_file.c")
commit 2965faa5e0 ("kexec: split kexec_load syscall from kexec core code")

Both kexec_file.c and kexec_core.c still belong to the kexec component.
In order to get correct mail lists by using the script get_maintainer.pl,
add these files to MAINTAINERS.

Link: http://lkml.kernel.org/r/1464189735-59113-1-git-send-email-mnghuan@gmail.com
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Vladimir Davydov
edd9f7230f mm: oom: do not reap task if there are live threads in threadgroup
If the current process is exiting, we don't invoke oom killer, instead
we give it access to memory reserves and try to reap its mm in case
nobody is going to use it.  There's a mistake in the code performing
this check - we just ignore any process of the same thread group no
matter if it is exiting or not - see try_oom_reaper.  Fix it.

Link: http://lkml.kernel.org/r/1464087628-7318-1-git-send-email-vdavydov@virtuozzo.com
Fixes: 3ef22dfff2 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Eryu Guan
9ecd10b7a0 direct-io: fix direct write stale data exposure from concurrent buffered read
Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
before calling get_block() callback), if it's a sparse file, direct
writes fall back to buffered writes to avoid stale data exposure from
concurrent buffered read.  But there're two cases that can result in
stale data exposure are not correctly detected.

1. The detection for "writing inside i_size" is not sufficient,
   writes can be treated as "extending writes" wrongly.  For example,
   direct write 1FSB (file system block) to a 1FSB sparse file on
   ext2/3/4, starting from offset 0, in this case it's writing inside
   i_size, but 'create' is non-zero, because 'block_in_file' and
   '(i_size_read(inode) >> blkbits' are both zero.

2. Direct writes starting from or beyong i_size (not inside i_size)
   also could trigger block allocation and expose stale data.  For
   example, consider a sparse file with i_size of 2k, and a write to
   offset 2k or 3k into the file, with a filesystem block size of 4k.
   (Thanks to Jeff Moyer for pointing this case out in his review.)

The first problem can be demostrated by running ltp-aiodio test ADSP045
many times.  When testing on extN filesystems, I see test failures
occasionally, buffered read could read non-zero (stale) data.

ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1

dio_sparse    0  TINFO  :  Dirtying free blocks
dio_sparse    0  TINFO  :  Starting I/O tests
non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
non-zero read at offset 0
dio_sparse    0  TINFO  :  Killing childrens(s)
dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally

The second problem can also be reproduced easily by a hacked dio_sparse
program, which accepts an option to specify the write offset.

What we should really do is to disable block allocation for writes that
could result in filling holes inside i_size.

Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
Junxiao Bi
38b52efd21 ocfs2: bump up o2cb network protocol version
Two new messages are added to support negotiating hb timeout.  Stop
nodes frmo talking an old version to mount as they will cause the
negotiation to fail.

Link: http://lkml.kernel.org/r/1464231615-27939-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00