Commit Graph

878712 Commits

Author SHA1 Message Date
Linus Torvalds
7e5192b93c for-5.5/disk-revalidate-20191122
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3YA5sQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplFxEACM7CwrWsullPX6b3j62NW6VepU5JQdzwVW
 S+bmLpb8Z2I4wzEnaVuWAY5hEhGaS9NFtQLdBG0W0YOzH7sweNmL38dZfCE4+oFj
 ZwytpXQQhAQUwkgANJCpNfzDymHduPsTz7RYqRr1plmhna1KC/dnhuMwg8lVOBf5
 myWjqcCHxxoQn6KFqcX9/Azz29ZrgzV28lOnZdiw9yoTjraBmS/ymx4woaa3pc2v
 UNw0Cgx53vHENJzEL9FNSxc0ENZq/bQhpDolnc2AlPGy9+vPg4afMitJb60KTT7r
 HpDcLGkYAIKLrfk8DUmFW8lZhWsxTchXvK2+zwQV7nXMcdUgGN/G3HTIdvWEHFv8
 oGbPB8cfdA2vNC9QAybwWEum/S0H/GfYsBVplNCUCdFXE7yj1cbKD5dPfCyIvmPz
 BjgMae5vH/KoH+vNdZ8NL5oFz2eFC3rLxa/Ss78pcEoBdiiV3WQHPv9MBmn/OQ/v
 CeUAM7omyWpbv3lcByNzIOkeeO3m6Ne28EpEMc2pzLnDPu2btvSyetdO488DE+7O
 MNfApZULVX91W7jWnhM5GR+1SJTdEXZnoxnFV+J/j4deog5vUR7Dt1VkujpUILfL
 7jMl3erF6C53wNrc465z8iLRp1ZM+aTpwatXXRfucNXeomExKK9zF+/+O1ACckUB
 jWDCR9NTcw==
 =e5Lx
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block

Pull disk revalidation updates from Jens Axboe:
 "This continues the work that Jan Kara started to thoroughly cleanup
  and consolidate how we handle rescans and revalidations"

* tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
  block: move clearing bd_invalidated into check_disk_size_change
  block: remove (__)blkdev_reread_part as an exported API
  block: fix bdev_disk_changed for non-partitioned devices
  block: move rescan_partitions to fs/block_dev.c
  block: merge invalidate_partitions into rescan_partitions
  block: refactor rescan_partitions
2019-11-25 11:37:01 -08:00
Linus Torvalds
464a47f45d for-5.5/zoned-20191122
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3YAiAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsJRD/wNfUGWVdIckw7iiFNuuipKBEy0Nd2VLt0B
 I+pVW/YjDsG2oxWXWPs5Nxc7ca2A8EzRXcWP0xEjBfOCcBh/9mULi1flkLRoWKcq
 v/OuTVif3ATvgJcwNkbMcoi0bYA/VwKi2dWC6ALhDDmZhyMTLeE362oIeOUNNnl6
 GM8CGZHaRfmBzcH5t+WnxiS6rBlt5iwFJ35EvZo3GMXGGiLGlryxEXPAwZrf4haA
 Z4atNinKcNXhb80LWHo23aK3bpnaumwKP4BPuLEyvnjS4iU8SeYTXy+w5yq1BE+h
 HBP5s3no/mPiBAG8b6EZXqOJUGlN596AQfNLu7vCR78tmImZF0jKRFsHEAaKXf+B
 1yRgZi7J+gV0qzK/Ufulg43vItk5/sTzEuV9YLfCpKTr14MFcWw908BAqaI5Kk1K
 e8uGqnb2KbZOLTW4QdPvpWg3eYtqEoluSoZUQ5elHxqQZ4MSZ1lK78FF1TeaW/pw
 sYH+v6rsWoVjEcFSwGoaaOMravzU4MKtavNAZrTJwKZx7qCqkwmi3R1k8WF6KsSV
 rTRAzUC1wpTdSOm1MYPMMKM/h5+BJRSJ/RjljOF4fXLnvpD5q0lequCWjrrEzc6c
 HPRKIgSBq7S620A19QD8UxwvZJ8bOivESqr0bux29v1Vpf7vJBrRMng8nLUrXfJs
 jdma5mK1UA==
 =/G9l
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block

Pull zoned block device update from Jens Axboe:
 "Enhancements and improvements to the zoned device support"

* tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
  scsi: sd_zbc: Remove set but not used variable 'buflen'
  block: rework zone reporting
  scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
  null_blk: Add zone_nr_conv to features
  null_blk: clean up report zones
  null_blk: clean up the block device operations
  block: Remove partition support for zoned block devices
  block: Simplify report zones execution
  block: cleanup the !zoned case in blk_revalidate_disk_zones
  block: Enhance blk_revalidate_disk_zones()
2019-11-25 11:22:37 -08:00
Linus Torvalds
323264eefb for-5.5/drivers-post-20191122
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3X/7gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgponxD/9mb/H9LD6/flEqoPv7n1dv7Y8Oe+AKpGLb
 a2Jh8ycpU6WtzZdlMYbQkxAqgJCupLTlih3WY3NuI1fwSsxwyMziQEnVnKgPNf7s
 PLWt+Qo5ryooyVkPi4KCHKjx2CFDUL4B1BqtSLm9n3eN72FRa9HCsCiMugjCbV+K
 aF7snzZ0ss+m7SnKIpdtXJjcdIFC2hXwCAGWAOv1vOPwhqQZFBsxjHKnEJtumTp2
 +wzLBPItLBbzHtAyopbiNlfsuHL9CF9L1QFaCTZE7N6eVWnlYpIhMCMrfEJ/e3jK
 27ct3PWAa4Qr2S/85//AMOyL9mxK96g9FjuGjxnKQZ+qWh89RPLq+oGs1zWSv2O0
 08BynWvxYHkoOR2+baPx89SHrlwyN0HCsvFBxVKrMsVHpy81sIYLFlytYaQUMx2/
 RkgkIxiAo5R5vYHPZYX8cU4c1rASjG0tYD9OA6e78MJFIBfkTl70XHHVX2Tgf5by
 fuwon/g+iVvN94QHb81ulZcA0hXRz8jM2RNIAPhqfoJXX6wNzD5MooNxTs9m88IP
 6/HaM1l6AJUtOMNA1aZbKAq+ARYuIA0/qHoSS0UVHoG8D4YkYaFpvYsAaA+TBzeO
 J8IcwSu6eVR1NvgJ9b4cwXFWSf75a/o4UeDP/1fYcQU5Gn/KQEdQVFSMvND1nnHW
 hUTP5AKFcQ==
 =oyE3
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block

Pull additional block driver updates from Jens Axboe:
 "Here's another block driver update, done to avoid conflicts with the
  zoned changes coming next.

  This contains:

   - Prepare SCSI sd for zone open/close/finish support

   - Small NVMe pull request
        - hwmon support (Akinobu)
        - add new co-maintainer (Christoph)
        - work-around for a discard issue on non-conformant drives
          (Eduard)

   - Small nbd leak fix"

* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
  nbd: prevent memory leak
  nvme: hwmon: add quirk to avoid changing temperature threshold
  nvme: hwmon: provide temperature min and max values for each sensor
  nvmet: add another maintainer
  nvme: Discard workaround for non-conformant devices
  nvme: Add hardware monitoring support
  scsi: sd_zbc: add zone open, close, and finish support
2019-11-25 11:18:03 -08:00
Linus Torvalds
2d53943090 for-5.5/drivers-20191121
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WyEAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplgbD/4jNeqT0q2IkNcUUEWkZWsBOlfi0SiclS5v
 X8JY1IxTlL0kaBWm83mw06JewucQ97Fh7xblPE8/iDHJqpgEX4vvSQY1b8hcDulZ
 YOKUnLkFU22nICeT04/8x/+f8gqD5KOlGxkgEvUKViQW15oc0oNu4St/yFM1QEN0
 qNMzpcfFXV9lYsOPl0y3pKdP+qbfcpeSmaFD9Z65gxN6rJy1WR8rtUGXy2luoiEc
 dh15IL9AGN/r8VTo8yRpD9PStiuJqpALIR8OHJSHPj+s0pQ6twk4aehcnYseAMbH
 zSDpa9AJrfqlnh8tUfKYLWi/PM7pMH0F01rAiQv47j/C0+QhbiOU/uTFTzUW5hQ1
 eK6XzJ0slxwnDsHLKf+xJmCj0Oyk0jDimNQr/2MNsuhmr29V5lfvBNflub8eOLyZ
 ie2Eulv+z6pYBSJx6kqm0X3vhXOy4wgU+X8LzvfcP9iAjgU1rfzxUWxLEj+KfJS2
 Nl+ERV9nafoPpoKpNR7zWRBUulp1qZJzo/U9JaUKiI5cWkIH1hhHmU2++xMeyJpb
 XHoDFNTGv6z/eef65eSveFD7F274TSi16K56Obk+4KWaSrIR0d6VwUA7FDmJbSI+
 Jqk1OFdaRGsQ5OcVxF1Qo4WChn0FvhcD0c+yL0N19WZ01QeYsb3hlA+MUPDtGQ04
 U79MPfu7iA==
 =i0jf
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Here are the main block driver updates for 5.5. Nothing major in here,
  mostly just fixes. This contains:

   - a set of bcache changes via Coly

   - MD changes from Song

   - loop unmap write-zeroes fix (Darrick)

   - spelling fixes (Geert)

   - zoned additions cleanups to null_blk/dm (Ajay)

   - allow null_blk online submit queue changes (Bart)

   - NVMe changes via Keith, nothing major here either"

* tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits)
  Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"
  drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
  drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
  bcache: don't export symbols
  bcache: remove the extra cflags for request.o
  bcache: at least try to shrink 1 node in bch_mca_scan()
  bcache: add idle_max_writeback_rate sysfs interface
  bcache: add code comments in bch_btree_leaf_dirty()
  bcache: fix deadlock in bcache_allocator
  bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
  bcache: deleted code comments for dead code in bch_data_insert_keys()
  bcache: add more accurate error messages in read_super()
  bcache: fix static checker warning in bcache_device_free()
  bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
  bcache: fix fifo index swapping condition in journal_pin_cmp()
  md/raid10: prevent access of uninitialized resync_pages offset
  md: avoid invalid memory access for array sb->dev_roles
  md/raid1: avoid soft lockup under high load
  null_blk: add zone open, close, and finish support
  dm: add zone open, close and finish support
  ...
2019-11-25 11:15:41 -08:00
Jouni Hogander
e58c191241 slip: Fix use-after-free Read in slip_open
Slip_open doesn't clean-up device which registration failed from the
slip_devs device list. On next open after failure this list is iterated
and freed device is accessed. Fix this by calling sl_free_netdev in error
path.

Here is the trace from the Syzbot:

__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:634
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
sl_sync drivers/net/slip/slip.c:725 [inline]
slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801
tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469
tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596
tiocsetd drivers/tty/tty_io.c:2334 [inline]
tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 3b5a39979d ("slip: Fix memory leak in slip_open error path")
Reported-by: syzbot+4d5170758f3762109542@syzkaller.appspotmail.com
Cc: David Miller <davem@davemloft.net>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 11:02:52 -08:00
Linus Torvalds
ff6814b078 for-5.5/block-20191121
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxrEQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuH5D/9qQKfIIuQDUNO4Xx+dIHimTDCrfiEOeO9e
 CRaMuSj+yMxLDMwfX8RnDmR17H3ZVoiIY1CT24U9ZkA5iDjeAH4xmzkH30US7LR7
 /64YVZTxB0OrWppRK8RiIhaJJZDQ6+HPUQsn6PRaLVuFHi2unMoTQnj/ZQKz03QA
 Pl8Xx7qBtH1JwYCzQ21f/uryAcNg9eWabRLN2f1uiOXLmvRxOfh6Z/iaezlaZlmL
 qeJdcdLjjvOgOPwEOfNjfS6pd+XBz3gdEhn0l+11nHITxWZmVBwsWTKyUQlCmKnl
 yuCWDVyx5d6zCnlrLYG0l2Fn2lr9SwAkdkq3YAKV03hA/6s6P9q9bm31VvOf828x
 7gmr4YVz68y7H9bM0QAHCvDpjll0aIEUw6XFzSOCDtZ9B6/pppYQWzMU71J05eyF
 8DOKv2M2EVNLUjf6u0RDyolnWGU0kIjt5ryWE3OsGcezAVa2wYstgUJTKbrn1YgT
 j+4KTpaI+sg8GKDFauvxcSa6gwoRp6jweFNW+7vC090/shXmrGmVLOnQZKRuHho/
 O4W8y/1/deM8CCIAETpiNxA8RV5U/EZygrFGDFc7yzTtVDGHY356M/B4Bmm2qkVu
 K3WgeZp8Fc0lH0QF6Pp9ZlBkZEpGNCAPVsPkXIsxQXbctftkn3KY//uIubfpFEB1
 PpHSicvkww==
 =HYYq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
 "Due to more granular branches, this one is small and will be followed
  with other core branches that add specific features. I meant to just
  have a core and drivers branch, but external dependencies we ended up
  adding a few more that are also core.

  The changes are:

   - Fixes and improvements for the zoned device support (Ajay, Damien)

   - sed-opal table writing and datastore UID (Revanth)

   - blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)

   - Improvements to the block stats tracking (Pavel)

   - Fix for overruning sysfs buffer for large number of CPUs (Ming)

   - Optimization for small IO (Ming, Christoph)

   - Fix typo in RWH lifetime hint (Eugene)

   - Dead code removal and documentation (Bart)

   - Reduction in memory usage for queue and tag set (Bart)

   - Kerneldoc header documentation (André)

   - Device/partition revalidation fixes (Jan)

   - Stats tracking for flush requests (Konstantin)

   - Various other little fixes here and there (et al)"

* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
  Revert "block: split bio if the only bvec's length is > SZ_4K"
  block: add iostat counters for flush requests
  block,bfq: Skip tracing hooks if possible
  block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
  blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
  block: Don't disable interrupts in trigger_softirq()
  sbitmap: Delete sbitmap_any_bit_clear()
  blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
  block: split bio if the only bvec's length is > SZ_4K
  block: still try to split bio if the bvec crosses pages
  blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
  blk-cgroup: reimplement basic IO stats using cgroup rstat
  blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
  blk-throtl: stop using blkg->stat_bytes and ->stat_ios
  bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
  bfq-iosched: relocate bfqg_*rwstat*() helpers
  block: add zone open, close and finish ioctl support
  block: add zone open, close and finish operations
  block: Simplify REQ_OP_ZONE_RESET_ALL handling
  block: Remove REQ_OP_ZONE_RESET plugging
  ...
2019-11-25 10:59:41 -08:00
Linus Torvalds
6e7b06a4c8 for-5.5/libata-20191121
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxTEQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppkrD/wI7A0K+CE6Cd0QQtVmZaMG6r68rj4wUtyB
 J4j44DOzfLwhRoQ4VhqbAysUiVHc2sT/10Ogq0vFmi5y2ewNQWHQ7tCCv52zbh0X
 bJdt67GkM81MNpX1Hjn/ABcSAy8nzQ+efPmbS/6SnEZvSh9HIajwE6E7CxPaQ6NT
 mZtLsPTvnn1Ct01JZecnFYSRWvjnKDd89B2KjU1FoUcOjZxCJiTXB1XQWlFAI2rH
 qnzTXYN6CewnENLlkUCdfFCIsaRh4q2gN7plSVL35VPYTWOYNSmgJT5glRZ7ZyHi
 fYPsAGvvYBRdqF/goOklqTRtmFNhwbnjW5pspcwQMyAlHGCpGRemxmQ8OA7qtjSk
 XrZrrPO5TZNs/c7vR5697anAa3fD5YefaRvzBJR9IpWS1rDOTg9P9y3isWOclQQz
 llVSZE4yW7qEBp5x9uvOR2jCdqx9HNaXKB3u+Qc9etYGdsAKROgeW7YbLkafhnQL
 ak0BM/BhHJYuJ/GIAPgQyB/D982G3I4vvUSB5Ixf3w6v4MDowUDznJNHdvhWSoe2
 eU1gb0tfVUSdCcHMcAYKtFJikGgD19rjESKQW8UJM1aUOyJkawdzTx4grMdzOfzB
 3vue4QKT3sPHeSCLQGLZLCKBN8sHRtKSPGaJKwXdvyZYFc4YZ8eFhg7ElICFJFRZ
 SS361Hviog==
 =ucL7
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/libata-20191121' of git://git.kernel.dk/linux-block

Pull libata updates from Jens Axboe:
 "Just a few fixes all over the place, support for the Annapurna SATA
  controller, and a patchset that cleans up the error defines and
  ultimately fixes anissue with sata_mv"

* tag 'for-5.5/libata-20191121' of git://git.kernel.dk/linux-block:
  ata: pata_artop: make arrays static const, makes object smaller
  ata_piix: remove open-coded dmi_match(DMI_OEM_STRING)
  ata: sata_mv, avoid trigerrable BUG_ON
  ata: make qc_prep return ata_completion_errors
  ata: define AC_ERR_OK
  ata: Documentation, fix function names
  libata: Ensure ata_port probe has completed before detach
  ahci: tegra: use regulator_bulk_set_supply_names()
  ahci: Add support for Amazon's Annapurna Labs SATA controller
2019-11-25 10:57:53 -08:00
Oleksij Rempel
9bca3a0a92 net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
This function was using configuration of port 0 in devicetree for all ports.
In case CPU port was not 0, the delay settings was ignored. This resulted not
working communication between CPU and the switch.

Fixes: f5b8631c29 ("net: dsa: sja1105: Error out if RGMII delays are requested in DT")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 10:56:12 -08:00
Menglong Dong
1d7ea55668 macvlan: schedule bc_work even if error
While enqueueing a broadcast skb to port->bc_queue, schedule_work()
is called to add port->bc_work, which processes the skbs in
bc_queue, to "events" work queue. If port->bc_queue is full, the
skb will be discarded and schedule_work(&port->bc_work) won't be
called. However, if port->bc_queue is full and port->bc_work is not
running or pending, port->bc_queue will keep full and schedule_work()
won't be called any more, and all broadcast skbs to macvlan will be
discarded. This case can happen:

macvlan_process_broadcast() is the pending function of port->bc_work,
it moves all the skbs in port->bc_queue to the queue "list", and
processes the skbs in "list". During this, new skbs will keep being
added to port->bc_queue in macvlan_broadcast_enqueue(), and
port->bc_queue may already full when macvlan_process_broadcast()
return. This may happen, especially when there are a lot of real-time
threads and the process is preempted.

Fix this by calling schedule_work(&port->bc_work) even if
port->bc_work is full in macvlan_broadcast_enqueue().

Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 10:54:22 -08:00
Po Liu
c431047c4e enetc: add support Credit Based Shaper(CBS) for hardware offload
The ENETC hardware support the Credit Based Shaper(CBS) which part
of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs
interface when set in the QOS in the kernel.

Here is an example command to set 20Mbits bandwidth in 1Gbits port
for taffic class 7:

tc qdisc add dev eth0 root handle 1: mqprio \
	   num_tc 8 map 0 1 2 3 4 5 6 7 hw 1

tc qdisc replace dev eth0 parent 1:8 cbs \
	   locredit -1470 hicredit 30 \
	   sendslope -980000 idleslope 20000 offload 1

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 10:53:15 -08:00
Heiner Kallweit
bec170e559 net: phy: add helpers phy_(un)lock_mdio_bus
Add helpers to make locking/unlocking the MDIO bus easier.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 10:47:00 -08:00
David Bauer
32085f25d7 mdio_bus: don't use managed reset-controller
Geert Uytterhoeven reported that using devm_reset_controller_get leads
to a WARNING when probing a reset-controlled PHY. This is because the
device devm_reset_controller_get gets supplied is not actually the
one being probed.

Acquire an unmanaged reset-control as well as free the reset_control on
unregister to fix this.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David Bauer <mail@david-bauer.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25 10:43:40 -08:00
Linus Torvalds
fb4b3d3fd0 for-5.5/io_uring-20191121
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxNwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgps4kD/9SIDXhYhhE8fNqeAF7Uouu8fxgwnkY3hSI
 43vJwCziiDxWWJH5mYW7/83VNOMZKHIbiYMnU6iEUsRQ/sG/wI0wEfAQZDHLzCKt
 cko2q7zAC1/4rtoslwJ3q04hE2Ap/nb93ELZBVr7fOAuODBNFUp/vifAojvsMPKz
 hNMNPq/vYg7c/iYMZKSBdtjE3tqceFNBjAVNMB9dHKQLeexEy4ve7AjBeawWsSi7
 GesnQ5w5u5LqkMYwLslpv/oVjHiiFWgGnDAvBNvykQvVy+DfB54KSqMV11W1aqdU
 l6L+ENfZasEvlk1yMAth2Foq4vlscm5MKEb6VdJhXWHHXtXkcBmz7RBqPmjSvXCY
 wS5GZRw8oYtTcid0aQf+t/wgRNTDJsGsnsT32qto41No3Z7vlIDHUDxHZGTA+gEL
 E8j9rDx6EXMTo3EFbC8XZcfsorhPJ1HKAyw1YFczHtYzJEQUR9jJe3f/Q9u6K2Vy
 s/EhkVeHa/lEd7kb6mI+6lQjGe1FXl7AHauDuaaEfIOZA/xJB3Bad5Wjq1va1cUO
 TX+37zjzFzJghhSIBGYq7G7iT4AMecPQgxHzCdCyYfW5S4Uur9tMmIElwVPI/Pjl
 kDZ9gdg9lm6JifZ9Ab8QcGhuQQTF3frwX9VfgrVgcqyvm38AiYzVgL9ZJnxRS/Cy
 ZfLNkACXqQ==
 =YZ9s
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "A lot of stuff has been going on this cycle, with improving the
  support for networked IO (and hence unbounded request completion
  times) being one of the major themes. There's been a set of fixes done
  this week, I'll send those out as well once we're certain we're fully
  happy with them.

  This contains:

   - Unification of the "normal" submit path and the SQPOLL path (Pavel)

   - Support for sparse (and bigger) file sets, and updating of those
     file sets without needing to unregister/register again.

   - Independently sized CQ ring, instead of just making it always 2x
     the SQ ring size. This makes it more flexible for networked
     applications.

   - Support for overflowed CQ ring, never dropping events but providing
     backpressure on submits.

   - Add support for absolute timeouts, not just relative ones.

   - Support for generic cancellations. This divorces io_uring from
     workqueues as well, which additionally gets us one step closer to
     generic async system call support.

   - With cancellations, we can support grabbing the process file table
     as well, just like we do mm context. This allows support for system
     calls that create file descriptors, like accept4() support that's
     built on top of that.

   - Support for io_uring tracing (Dmitrii)

   - Support for linked timeouts. These abort an operation if it isn't
     completed by the time noted in the linke timeout.

   - Speedup tracking of poll requests

   - Various cleanups making the coder easier to follow (Jackie, Pavel,
     Bob, YueHaibing, me)

   - Update MAINTAINERS with new io_uring list"

* tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits)
  io_uring: make POLL_ADD/POLL_REMOVE scale better
  io-wq: remove now redundant struct io_wq_nulls_list
  io_uring: Fix getting file for non-fd opcodes
  io_uring: introduce req_need_defer()
  io_uring: clean up io_uring_cancel_files()
  io-wq: ensure free/busy list browsing see all items
  io-wq: ensure we have a stable view of ->cur_work for cancellations
  io_wq: add get/put_work handlers to io_wq_create()
  io_uring: check for validity of ->rings in teardown
  io_uring: fix potential deadlock in io_poll_wake()
  io_uring: use correct "is IO worker" helper
  io_uring: fix -ENOENT issue with linked timer with short timeout
  io_uring: don't do flush cancel under inflight_lock
  io_uring: flag SQPOLL busy condition to userspace
  io_uring: make ASYNC_CANCEL work with poll and timeout
  io_uring: provide fallback request for OOM situations
  io_uring: convert accept4() -ERESTARTSYS into -EINTR
  io_uring: fix error clear of ->file_table in io_sqe_files_register()
  io_uring: separate the io_free_req and io_free_req_find_next interface
  io_uring: keep io_put_req only responsible for release and put req
  ...
2019-11-25 10:40:27 -08:00
Linus Torvalds
54f0e54011 tpmdd update for Linux v5.5
-----BEGIN PGP SIGNATURE-----
 
 iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXcsMpCAcamFya2tvLnNh
 a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0pTtAP9+6nxJGXpr8eEA
 PJgcKiRenRpTeY3xqAsHKRtw6WqpywEAzFUlZvDKEogE15SEVA03vzk/KuP8jM0q
 pq8mHOdAnwQ=
 =xCn9
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-20191112' of git://git.infradead.org/users/jjs/linux-tpmdd

Pull tpmd updates from Jarkko Sakkinen:

 - support for Cr50 fTPM

 - support for fTPM on AMD Zen+ CPUs

 - TPM 2.0 trusted keys code relocated from drivers/char/tpm to
   security/keys

* tag 'tpmdd-next-20191112' of git://git.infradead.org/users/jjs/linux-tpmdd:
  KEYS: trusted: Remove set but not used variable 'keyhndl'
  tpm: Switch to platform_get_irq_optional()
  tpm_crb: fix fTPM on AMD Zen+ CPUs
  KEYS: trusted: Move TPM2 trusted keys code
  KEYS: trusted: Create trusted keys subsystem
  KEYS: Use common tpm_buf for trusted and asymmetric keys
  tpm: Move tpm_buf code to include/linux/
  tpm: use GFP_KERNEL instead of GFP_HIGHMEM for tpm_buf
  tpm: add check after commands attribs tab allocation
  tpm: tpm_tis_spi: Drop THIS_MODULE usage from driver struct
  tpm: tpm_tis_spi: Cleanup includes
  tpm: tpm_tis_spi: Support cr50 devices
  tpm: tpm_tis_spi: Introduce a flow control callback
  tpm: Add a flag to indicate TPM power is managed by firmware
  dt-bindings: tpm: document properties for cr50
  tpm_tis: override durations for STM tpm with firmware 1.2.8.28
  tpm: provide a way to override the chip returned durations
  tpm: Remove duplicate code from caps_show() in tpm-sysfs.c
2019-11-25 10:29:42 -08:00
Linus Torvalds
0be0ee7181 vfs: properly and reliably lock f_pos in fdget_pos()
fdget_pos() is used by file operations that will read and update f_pos:
things like "read()", "write()" and "lseek()" (but not, for example,
"pread()/pwrite" that get their file positions elsewhere).

However, it had two separate escape clauses for this, because not
everybody wants or needs serialization of the file position.

The first and most obvious case is the "file descriptor doesn't have a
position at all", ie a stream-like file.  Except we didn't actually use
FMODE_STREAM, but instead used FMODE_ATOMIC_POS.  The reason for that
was that FMODE_STREAM didn't exist back in the days, but also that we
didn't want to mark all the special cases, so we only marked the ones
that _required_ position atomicity according to POSIX - regular files
and directories.

The case one was intentionally lazy, but now that we _do_ have
FMODE_STREAM we could and should just use it.  With the change to use
FMODE_STREAM, there are no remaining uses for FMODE_ATOMIC_POS, and all
the code to set it is deleted.

Any cases where we don't want the serialization because the driver (or
subsystem) doesn't use the file position should just be updated to do
"stream_open()".  We've done that for all the obvious and common
situations, we may need a few more.  Quoting Kirill Smelkov in the
original FMODE_STREAM thread (see link below for full email):

 "And I appreciate if people could help at least somehow with "getting
  rid of mixed case entirely" (i.e. always lock f_pos_lock on
  !FMODE_STREAM), because this transition starts to diverge from my
  particular use-case too far. To me it makes sense to do that
  transition as follows:

   - convert nonseekable_open -> stream_open via stream_open.cocci;
   - audit other nonseekable_open calls and convert left users that
     truly don't depend on position to stream_open;
   - extend stream_open.cocci to analyze alloc_file_pseudo as well (this
     will cover pipes and sockets), or maybe convert pipes and sockets
     to FMODE_STREAM manually;
   - extend stream_open.cocci to analyze file_operations that use
     no_llseek or noop_llseek, but do not use nonseekable_open or
     alloc_file_pseudo. This might find files that have stream semantic
     but are opened differently;
   - extend stream_open.cocci to analyze file_operations whose
     .read/.write do not use ppos at all (independently of how file was
     opened);
   - ...
   - after that remove FMODE_ATOMIC_POS and always take f_pos_lock if
     !FMODE_STREAM;
   - gather bug reports for deadlocked read/write and convert missed
     cases to FMODE_STREAM, probably extending stream_open.cocci along
     the road to catch similar cases

  i.e. always take f_pos_lock unless a file is explicitly marked as
  being stream, and try to find and cover all files that are streams"

We have not done the "extend stream_open.cocci to analyze
alloc_file_pseudo" as well, but the previous commit did manually handle
the case of pipes and sockets.

The other case where we can avoid locking f_pos is the "this file
descriptor only has a single user and it is us, and thus there is no
need to lock it".

The second test was correct, although a bit subtle and worth just
re-iterating here.  There are two kinds of other sources of references
to the same file descriptor: file descriptors that have been explicitly
shared across fork() or with dup(), and file tables having elevated
reference counts due to threading (or explicit file sharing with
clone()).

The first case would have incremented the file count explicitly, and in
the second case the previous __fdget() would have incremented it for us
and set the FDPUT_FPUT flag.

But in both cases the file count would be greater than one, so the
"file_count(file) > 1" test catches both situations.  Also note that if
file_count is 1, that also means that no other thread can have access to
the file table, so there also cannot be races with concurrent calls to
dup()/fork()/clone() that would increment the file count any other way.

Link: https://lore.kernel.org/linux-fsdevel/20190413184404.GA13490@deco.navytux.spb.ru
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Eic Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Marco Elver <elver@google.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-25 10:01:53 -08:00
Linus Torvalds
d8e464ecc1 vfs: mark pipes and sockets as stream-like file descriptors
In commit 3975b097e5 ("convert stream-like files -> stream_open, even
if they use noop_llseek") Kirill used a coccinelle script to change
"nonseekable_open()" to "stream_open()", which changed the trivial cases
of stream-like file descriptors to the new model with FMODE_STREAM.

However, the two big cases - sockets and pipes - don't actually have
that trivial pattern at all, and were thus never converted to
FMODE_STREAM even though it makes lots of sense to do so.

That's particularly true when looking forward to the next change:
getting rid of FMODE_ATOMIC_POS entirely, and just using FMODE_STREAM to
decide whether f_pos updates are needed or not.  And if they are, we'll
always do them atomically.

This came up because KCSAN (correctly) noted that the non-locked f_pos
updates are data races: they are clearly benign for the case where we
don't care, but it would be good to just not have that issue exist at
all.

Note that the reason we used FMODE_ATOMIC_POS originally is that only
doing it for the minimal required case is "safer" in that it's possible
that the f_pos locking can cause unnecessary serialization across the
whole write() call.  And in the worst case, that kind of serialization
can cause deadlock issues: think writers that need readers to empty the
state using the same file descriptor.

[ Note that the locking is per-file descriptor - because it protects
  "f_pos", which is obviously per-file descriptor - so it only affects
  cases where you literally use the same file descriptor to both read
  and write.

  So a regular pipe that has separate reading and writing file
  descriptors doesn't really have this situation even though it's the
  obvious case of "reader empties what a bit writer concurrently fills"

  But we want to make pipes as being stream-line anyway, because we
  don't want the unnecessary overhead of locking, and because a named
  pipe can be (ab-)used by reading and writing to the same file
  descriptor. ]

There are likely a lot of other cases that might want FMODE_STREAM, and
looking for ".llseek = no_llseek" users and other cases that don't have
an lseek file operation at all and making them use "stream_open()" might
be a good idea.  But pipes and sockets are likely to be the two main
cases.

Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Eic Dumazet <edumazet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Marco Elver <elver@google.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-25 09:12:11 -08:00
Qian Cai
40363cf139 writeback: fix -Wformat compilation warnings
The commit f05499a06f ("writeback: use ino_t for inodes in
tracepoints") introduced a lot of GCC compilation warnings on s390,

In file included from ./include/trace/define_trace.h:102,
                 from ./include/trace/events/writeback.h:904,
                 from fs/fs-writeback.c:82:
./include/trace/events/writeback.h: In function
'trace_raw_output_writeback_page_template':
./include/trace/events/writeback.h:76:12: warning: format '%lu' expects
argument of type 'long unsigned int', but argument 4 has type 'ino_t'
{aka 'unsigned int'} [-Wformat=]
  TP_printk("bdi %s: ino=%lu index=%lu",
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/trace/trace_events.h:360:22: note: in definition of macro
'DECLARE_EVENT_CLASS'
  trace_seq_printf(s, print);     \
                      ^~~~~
./include/trace/events/writeback.h:76:2: note: in expansion of macro
'TP_printk'
  TP_printk("bdi %s: ino=%lu index=%lu",
  ^~~~~~~~~

Fix them by adding necessary casts where ino_t could be either "unsigned
int" or "unsigned long".

Fixes: f05499a06f ("writeback: use ino_t for inodes in tracepoints")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-25 07:50:41 -08:00
Jens Verwiebe
bf2aa5cadd ALSA: usb-audio: Fix Focusrite Scarlett 6i6 gen1 - input handling
The Scarlett 6i6 has no padding on rear inputs 3/4 but a gainstage.
This patch introduces this functionality as to be seen in the mac
or windows scarlett control.

The correct address could already be found in the dump info, but was
never used. Without this patch inputs 3/4 are quite unusable else.

Signed-off-by: Jens Verwiebe <info@jensverwiebe.de>
Link: https://lore.kernel.org/r/384d65cd-5e87-91eb-9fc3-e57226f534c6@jensverwiebe.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-25 16:49:35 +01:00
Jian-Hong Pan
436e25505f ALSA: hda/realtek - Enable internal speaker of ASUS UX431FLC
Laptops like ASUS UX431FLC and UX431FL can share the same audio quirks.
But UX431FLC needs one more step to enable the internal speaker: Pull
the GPIO from CODEC to initialize the AMP.

Fixes: 60083f9e94 ("ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191125093405.5702-1-jian-hong@endlessm.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-25 15:46:23 +01:00
Takashi Iwai
09578eacaa ASoC: More updates for v5.5
Some more development work for v5.5.  Highlights include:
 
  - More cleanups from Morimoto-san.
  - Trigger word detection for RT5677.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl3bzrQTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0LkMB/4/MkpN4F2WudNCKgmYay2gw3hq8aOR
 5Xh/SE6N3BEgBpQiMEUdePRT6LqnQMsSpOaUoKfd4nFvCyLaV7itaYzQuUpo+UJ/
 a4UDHJPVfrDJeFghWqSngfEW/N66mpO3UuAvhSRTR3ku0T31v9FkcLsMbdxVUpLC
 ablSaFfPCOtTQOsG9blRUL/GAWUhGzPI/Hl4VldS0FvKDmMDY2fxt7QxXRaEYItc
 i1Uthklxd0BfRmDqv0WEpduUhbwzg6Uir9pXSisux7DpMk1Li0IyPST5UcRl0Zf4
 CKL/qfdQANpufj9kGmrlCOnXB9P5/XYVwfYV430WqMyw0QaOzzSrxT7C
 =CZ/+
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v5.5-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: More updates for v5.5

Some more development work for v5.5.  Highlights include:

 - More cleanups from Morimoto-san.
 - Trigger word detection for RT5677.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-25 14:27:41 +01:00
Petr Mladek
0e672adc87 Merge branch 'for-5.5/system-state' into for-linus 2019-11-25 13:53:49 +01:00
Petr Mladek
d891433b8d Merge branch 'for-5.5/selftests' into for-linus 2019-11-25 13:53:15 +01:00
Ingo Molnar
de881a341c Merge branch 'sched/rt' into sched/core, to pick up commit
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 13:48:11 +01:00
Paolo Bonzini
9671024729 Second KVM PPC update for 5.5
- Two fixes from Greg Kurz to fix memory leak bugs in the XIVE code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEv0VLfXa2m9eKuaRpnZrqdyxjcZ8FAl3bJKwACgkQnZrqdyxj
 cZ92xQgAhgnARWJwh+uazayNrwB12TJA7G25RO8CUEwWaAY/io5QeO7nQCmNZ3cf
 TflQpI1dL5qFpzU7uNunHqdqyhlaD0wwkHfrN71molr5sA1uRlIyxwwkE6coZQEC
 n/LiGayoxqt2Ra06H4L4SGSjb7fcCl8eYjC3xjTx9Zdf/iXVcwYprBch5kcrToLV
 s0NvRvDgwcaqsxQyybTO0wRvME/qz9JFtNUgl6H4PNSt3l/yv+rM+BgjyNR3tyKu
 B1G4937GqBIAV4jYmK0a/LDnNfxs9EmOjuJLKCHmVxlfbsg8wasNk3kj+mdrh2O3
 ZjCdh782GyGwp/ysddOHmIhXFyQMhQ==
 =9kV2
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Second KVM PPC update for 5.5

- Two fixes from Greg Kurz to fix memory leak bugs in the XIVE code.
2019-11-25 11:29:05 +01:00
Andy Lutomirski
4a13b0e3e1 x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
UNWIND_ESPFIX_STACK needs to read the GDT, and the GDT mapping that
can be accessed via %fs is not mapped in the user pagetables.  Use
SGDT to find the cpu_entry_area mapping and read the espfix offset
from that instead.

Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:36:47 +01:00
Will Deacon
500543c53a lkdtm: Remove references to CONFIG_REFCOUNT_FULL
CONFIG_REFCOUNT_FULL no longer exists, so remove all references to it.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-11-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:46 +01:00
Will Deacon
2f30b36943 locking/refcount: Remove unused 'refcount_error_report()' function
'refcount_error_report()' has no callers. Remove it.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-10-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:42 +01:00
Will Deacon
fb041bb7c0 locking/refcount: Consolidate implementations of refcount_t
The generic implementation of refcount_t should be good enough for
everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely,
leaving the generic implementation enabled unconditionally.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:32 +01:00
Will Deacon
65b0085524 locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
The definitions of REFCOUNT_MAX and REFCOUNT_SATURATED are the same,
regardless of CONFIG_REFCOUNT_FULL, so consolidate them into a single
pair of definitions.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-8-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:27 +01:00
Will Deacon
1eb085d942 locking/refcount: Move saturation warnings out of line
Having the refcount saturation and warnings inline bloats the text,
despite the fact that these paths should never be executed in normal
operation.

Move the refcount saturation and warnings out of line to reduce the
image size when refcount_t checking is enabled. Relative to an x86_64
defconfig, the sizes reported by bloat-o-meter are:

 # defconfig+REFCOUNT_FULL, inline saturation (i.e. before this patch)
 Total: Before=14762076, After=14915442, chg +1.04%

 # defconfig+REFCOUNT_FULL, out-of-line saturation (i.e. after this patch)
 Total: Before=14762076, After=14835497, chg +0.50%

A side-effect of this change is that we now only get one warning per
refcount saturation type, rather than one per problematic call-site.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-7-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:21 +01:00
Will Deacon
dcb786493f locking/refcount: Improve performance of generic REFCOUNT_FULL code
Rewrite the generic REFCOUNT_FULL implementation so that the saturation
point is moved to INT_MIN / 2. This allows us to defer the sanity checks
until after the atomic operation, which removes many uses of cmpxchg()
in favour of atomic_fetch_{add,sub}().

Some crude perf results obtained from lkdtm show substantially less
overhead, despite the checking:

 $ perf stat -r 3 -B -- echo {ATOMIC,REFCOUNT}_TIMING >/sys/kernel/debug/provoke-crash/DIRECT

 # arm64
 ATOMIC_TIMING:                                      46.50451 +- 0.00134 seconds time elapsed  ( +-  0.00% )
 REFCOUNT_TIMING (REFCOUNT_FULL, mainline):          77.57522 +- 0.00982 seconds time elapsed  ( +-  0.01% )
 REFCOUNT_TIMING (REFCOUNT_FULL, this series):       48.7181  +- 0.0256  seconds time elapsed  ( +-  0.05% )

 # x86
 ATOMIC_TIMING:                                      31.6225 +- 0.0776 seconds time elapsed  ( +-  0.25% )
 REFCOUNT_TIMING (!REFCOUNT_FULL, mainline/x86 asm): 31.6689 +- 0.0901 seconds time elapsed  ( +-  0.28% )
 REFCOUNT_TIMING (REFCOUNT_FULL, mainline):          53.203  +- 0.138  seconds time elapsed  ( +-  0.26% )
 REFCOUNT_TIMING (REFCOUNT_FULL, this series):       31.7408 +- 0.0486 seconds time elapsed  ( +-  0.15% )

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Jan Glauber <jglauber@marvell.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-6-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:10 +01:00
Will Deacon
77e9971c79 locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the <linux/refcount.h> header
In an effort to improve performance of the REFCOUNT_FULL implementation,
move the bulk of its functions into linux/refcount.h. This allows them
to be inlined in the same way as if they had been provided via
CONFIG_ARCH_HAS_REFCOUNT.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-5-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:06 +01:00
Will Deacon
7221762c48 locking/refcount: Remove unused refcount_*_checked() variants
The full-fat refcount implementation is exposed via a set of functions
suffixed with "_checked()", the idea being that code can choose to use
the more expensive, yet more secure implementation on a case-by-case
basis.

In reality, this hasn't happened, so with a grand total of zero users,
let's remove the checked variants for now by simply dropping the suffix
and predicating the out-of-line functions on CONFIG_REFCOUNT_FULL=y.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-4-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:15:03 +01:00
Will Deacon
97a1420adf locking/refcount: Ensure integer operands are treated as signed
In preparation for changing the saturation point of REFCOUNT_FULL to
INT_MIN/2, change the type of integer operands passed into the API
from 'unsigned int' to 'int' so that we can avoid casting during
comparisons when we don't want to fall foul of C integral conversion
rules for signed and unsigned types.

Since the kernel is compiled with '-fno-strict-overflow', we don't need
to worry about the UB introduced by signed overflow here. Furthermore,
we're already making heavy use of the atomic_t API, which operates
exclusively on signed types.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-3-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:14:57 +01:00
Will Deacon
23e6b169c9 locking/refcount: Define constants for saturation and max refcount values
The REFCOUNT_FULL implementation uses a different saturation point than
the x86 implementation, which means that the shared refcount code in
lib/refcount.c (e.g. refcount_dec_not_one()) needs to be aware of the
difference.

Rather than duplicate the definitions from the lkdtm driver, instead
move them into <linux/refcount.h> and update all references accordingly.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-2-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:14:13 +01:00
Ingo Molnar
ceb9e77324 Merge branch 'x86/core' into perf/core, to resolve conflicts and to pick up completed topic tree
Conflicts:
	tools/perf/check-headers.sh

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:09:27 +01:00
Ingo Molnar
c494cd6469 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:08:29 +01:00
Ingo Molnar
f01ec4fca8 Merge branch 'x86/build' into x86/asm, to pick up completed topic branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 09:05:09 +01:00
Ingo Molnar
05b042a194 x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
When two recent commits that increased the size of the 'struct cpu_entry_area'
were merged in -tip, the 32-bit defconfig build started failing on the following
build time assert:

  ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE
  arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’
  In function ‘setup_cpu_entry_area_ptes’,

Which corresponds to the following build time assert:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

The purpose of this assert is to sanity check the fixed-value definition of
CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h:

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 41)

The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value
we didn't want to define in such a low level header, because it would cause
dependency hell.

Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES
accordingly - and this assert is checking that constraint.

But the assert is both imprecise and buggy, primarily because it doesn't
include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE
(which begins at a PMD boundary).

This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined
too large upstream (v5.4-rc8):

	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 40)

While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra
pages, which hid the bug.

The following commit (not yet upstream) increased the size to 40 pages:

  x86/iopl: ("Restrict iopl() permission scope")

... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap
to just 1 extra page.

Then another not-yet-upstream commit changed the size again:

  880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Which increased the cpu_entry_area size from 38 to 39 pages, but
didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked
fine, because we still had a page left from the accidental 'reserve'.

But when these two commits were merged into the same tree, the
combined size of cpu_entry_area grew from 38 to 40 pages, while
CPU_ENTRY_AREA_PAGES finally caught up to 40 as well.

Which is fine in terms of functionality, but the assert broke:

	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);

because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area,
which is 1 page larger due to the IDT page.

To fix all this, change the assert to two precise asserts:

	BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE);
	BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE);

This takes the IDT page into account, and also connects the size-based
define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based
define of CPU_ENTRY_AREA_MAP_SIZE.

Also clean up some of the names which made it rather confusing:

 - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of
   the cpu-entry-area, but the per-cpu array size, so rename this
   to CPU_ENTRY_AREA_ARRAY_SIZE.

 - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping
   size, with the IDT included.

 - Add comments where '+1' denotes the IDT mapping - it wasn't
   obvious and took me about 3 hours to decode...

Finally, because this particular commit is actually applied after
this patch:

  880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")

Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages.

All future commits that change cpu_entry_area will have to adjust
this value precisely.

As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES
and derive its value directly from the structure, without causing
header hell - but that is an adventure for another day! :-)

Fixes: 880a98c339: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-25 08:53:33 +01:00
David S. Miller
4eb47198e9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-11-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 27 non-merge commits during the last 4 day(s) which contain
a total of 50 files changed, 2031 insertions(+), 548 deletions(-).

The main changes are:

1) Optimize bpf_tail_call() from retpoline-ed indirect jump to direct jump,
   from Daniel.

2) Support global variables in libbpf, from Andrii.

3) Cleanup selftests with BPF_TRACE_x() macro, from Martin.

4) Fix devmap hash, from Toke.

5) Fix register bounds after 32-bit conditional jumps, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-24 22:05:47 -08:00
Jakub Kicinski
5f04ed74a8 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2019-11-24

Here's one last bluetooth-next pull request for the 5.5 kernel:

 - Fix BDADDR_PROPERTY & INVALID_BDADDR quirk handling
 - Added support for BCM4334B0 and BCM4335A0 controllers
 - A few other smaller fixes related to locking and memory leaks
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24 18:57:52 -08:00
Andreas K. Besslein
dc83ef22cd ax88179_178a: add ethtool_op_get_ts_info()
This enables the use of SW timestamping.

ax88179_178a uses the usbnet transmit function usbnet_start_xmit() which
implements software timestamping. ax88179_178a overrides ethtool_ops but
missed to set .get_ts_info. This caused SOF_TIMESTAMPING_TX_SOFTWARE
capability to be not available.

Signed-off-by: Andreas K. Besslein <besslein.andreas@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24 18:40:54 -08:00
Jakub Kicinski
bd85880197 Merge branch 'mlxsw-Two-small-updates'
Ido Schimmel says:

====================
mlxsw: Two small updates

Patch #1 from Petr handles a corner case in GRE tunnel offload.

Patch #2 from Amit fixes a recent issue where the driver was programming
the device to use an adjacency index (for a nexthop) that was not
properly initialized.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24 18:36:29 -08:00
Amit Cohen
ed43cff065 mlxsw: spectrum_router: Fix use of uninitialized adjacency index
When mlxsw_sp_adj_discard_write() is called for the first time, the
value stored in 'mlxsw_sp->router->adj_discard_index' is invalid, as
indicated by 'mlxsw_sp->router->adj_discard_index_valid' being set to
'false'.

In this case, we should not use the value initially stored in
'mlxsw_sp->router->adj_discard_index' (0) and instead use the value
allocated later in the function.

Fixes: 983db6198f ("mlxsw: spectrum_router: Allocate discard adjacency entry when needed")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24 18:34:46 -08:00
Petr Machata
c5731cc5eb mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels
When a GRE tunnel is bound to an underlay netdevice and that netdevice is
moved to a different VRF, that could cause two tunnels to have the same
underlay local address in the same VRF. Linux in this situation dispatches
the traffic according to the tunnel key (or lack thereof), but that cannot
be offloaded to Spectrum devices.

Detect this situation and unoffload the two impacted tunnels when it
happens.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24 18:34:46 -08:00
Daniel Borkmann
b553a6ec57 bpf: Simplify __bpf_arch_text_poke poke type handling
Given that we have BPF_MOD_NOP_TO_{CALL,JUMP}, BPF_MOD_{CALL,JUMP}_TO_NOP
and BPF_MOD_{CALL,JUMP}_TO_{CALL,JUMP} poke types and that we also pass in
old_addr as well as new_addr, it's a bit redundant and unnecessarily
complicates __bpf_arch_text_poke() itself since we can derive the same from
the *_addr that were passed in. Hence simplify and use BPF_MOD_{CALL,JUMP}
as types which also allows to clean up call-sites.

In addition to that, __bpf_arch_text_poke() currently verifies that text
matches expected old_insn before we invoke text_poke_bp(). Also add a check
on new_insn and skip rewrite if it already matches. Reason why this is rather
useful is that it avoids making any special casing in prog_array_map_poke_run()
when old and new prog were NULL and has the benefit that also for this case
we perform a check on text whether it really matches our expectations.

Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/fcb00a2b0b288d6c73de4ef58116a821c8fe8f2f.1574555798.git.daniel@iogearbox.net
2019-11-24 17:12:11 -08:00
Martin KaFai Lau
f9a7cf6eb1 bpf: Introduce BPF_TRACE_x helper for the tracing tests
For BPF_PROG_TYPE_TRACING, the bpf_prog's ctx is an array of u64.
This patch borrows the idea from BPF_CALL_x in filter.h to
convert a u64 to the arg type of the traced function.

The new BPF_TRACE_x has an arg to specify the return type of a bpf_prog.
It will be used in the future TCP-ops bpf_prog that may return "void".

The new macros are defined in the new header file "bpf_trace_helpers.h".
It is under selftests/bpf/ for now.  It could be moved to libbpf later
after seeing more upcoming non-tracing use cases.

The tests are changed to use these new macros also.  Hence,
the k[s]u8/16/32/64 are no longer needed and they are removed
from the bpf_helpers.h.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191123202504.1502696-1-kafai@fb.com
2019-11-24 17:12:11 -08:00
Daniel Borkmann
b8cd76ca4a bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT
Add a definition of bpf_jit_blinding_enabled() when CONFIG_BPF_JIT is not set
in order to fix a recent build regression:

  [...]
  CC      kernel/bpf/verifier.o
  CC      kernel/bpf/inode.o
kernel/bpf/verifier.c: In function ‘fixup_bpf_calls’:
kernel/bpf/verifier.c:9132:25: error: implicit declaration of function ‘bpf_jit_blinding_enabled’; did you mean ‘bpf_jit_kallsyms_enabled’? [-Werror=implicit-function-declaration]
 9132 |  bool expect_blinding = bpf_jit_blinding_enabled(prog);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~
      |                         bpf_jit_kallsyms_enabled
  CC      kernel/bpf/helpers.o
  CC      kernel/bpf/hashtab.o
  [...]

Fixes: d2e4c1e6c2 ("bpf: Constant map key tracking for prog array pokes")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/40baf8f3507cac4851a310578edfb98ce73b5605.1574541375.git.daniel@iogearbox.net
2019-11-24 17:11:28 -08:00
Alexei Starovoitov
6dbae03b2e Merge branch 'optimize-bpf_tail_call'
Daniel Borkmann says:

====================
This gets rid of indirect jumps for BPF tail calls whenever possible.
The series adds emission for *direct* jumps for tail call maps in order
to avoid the retpoline overhead from a493a87f38 ("bpf, x64: implement
retpoline for tail call") for situations that allow for it, meaning,
for known constant keys at verification time which are used as index
into the tail call map. See patch 7/8 for more general details.

Thanks!

v1  -> v2:
  - added more test cases
  - u8 ip_stable -> bool (Andrii)
  - removed bpf_map_poke_{un,}lock and simplified the code (Andrii)
  - added break into prog_array_map_poke_untrack since there's just
    one prog (Andrii)
  - fixed typo: for for in commit msg (Andrii)
  - reworked __bpf_arch_text_poke (Andrii)
  - added subtests, and comment on tests themselves, NULL-NULL
    transistion (Andrii)
  - in constant map key tracking I've moved the map_poke_track callback
    to once we've finished creating the poke tab as otherwise concurrent
    access from tail call map would blow up (since we realloc the table)
rfc -> v1:
  - Applied Alexei's and Andrii's feeback from
    https://lore.kernel.org/bpf/cover.1573779287.git.daniel@iogearbox.net/T/#t
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-24 17:04:12 -08:00
Daniel Borkmann
79d49ba048 bpf, testing: Add various tail call test cases
Add several BPF kselftest cases for tail calls which test the various
patch directions, and that multiple locations are patched in same and
different programs.

  # ./test_progs -n 45
   #45/1 tailcall_1:OK
   #45/2 tailcall_2:OK
   #45/3 tailcall_3:OK
   #45/4 tailcall_4:OK
   #45/5 tailcall_5:OK
   #45 tailcalls:OK
  Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED

I've also verified the JITed dump after each of the rewrite cases that
it matches expectations.

Also regular test_verifier suite passes fine which contains further tail
call tests:

  # ./test_verifier
  [...]
  Summary: 1563 PASSED, 0 SKIPPED, 0 FAILED

Checked under JIT, interpreter and JIT + hardening.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/3d6cbecbeb171117dccfe153306e479798fb608d.1574452833.git.daniel@iogearbox.net
2019-11-24 17:04:12 -08:00