The swap offset reported by /proc/<pid>/pagemap may be not correct for
PMD migration entries. If addr passed into pagemap_pmd_range() isn't
aligned with PMD start address, the swap offset reported doesn't
reflect this. And in the loop to report information of each sub-page,
the swap offset isn't increased accordingly as that for PFN.
This may happen after opening /proc/<pid>/pagemap and seeking to a page
whose address doesn't align with a PMD start address. I have verified
this with a simple test program.
BTW: migration swap entries have PFN information, do we need to restrict
whether to show them?
[akpm@linux-foundation.org: fix typo, per Huang, Ying]
Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Jerome Glisse" <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Wang has reported that LTP move_pages04 test fails with the current
tree:
LTP move_pages04:
TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT
The test allocates an array of two pages, one is present while the other
is not (resp. backed by zero page) and it expects EFAULT for the second
page as the man page suggests. We are reporting EPERM which doesn't make
any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter
THP migration").
do_pages_move tries to handle as many pages in one batch as possible so we
queue all pages with the same node target together and that corresponds to
[start, i] range which is then used to update status array.
add_page_for_migration will correctly notice the zero (resp. !present)
page and returns with EFAULT which gets written to the status. But if
this is the last page in the array we do not update start and so the last
store_status after the loop will overwrite the range of the last batch
with NUMA_NO_NODE (which corresponds to EPERM).
Fix this by simply bailing out from the last flush if the pagelist is
empty as there is clearly nothing more to do.
Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org
Fixes: cf5f16b23ec9 ("mm: unclutter THP migration")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.
Performing back-to-back kernel builds before:
Run times: 157.86 157.09 158.90 160.94 160.80
Mean: 159.12
Std Dev: 1.54
and after:
Run times: 159.31 157.34 156.71 158.15 160.81
Mean: 158.46
Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.
before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean: 221015379122.60
Std Dev: 4662486552.47
after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean: 217745009865.40
Std Dev: 5935559279.99
It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RHBZ: 1453123
Since at least the 3.10 kernel and likely a lot earlier we have
not been able to create unix domain sockets in a cifs share
when mounted using the SFU mount option (except when mounted
with the cifs unix extensions to Samba e.g.)
Trying to create a socket, for example using the af_unix command from
xfstests will cause :
BUG: unable to handle kernel NULL pointer dereference at 00000000
00000040
Since no one uses or depends on being able to create unix domains sockets
on a cifs share the easiest fix to stop this vulnerability is to simply
not allow creation of any other special files than char or block devices
when sfu is used.
Added update to Ronnie's patch to handle a tcon link leak, and
to address a buf leak noticed by Gustavo and Colin.
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
CC: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Cc: stable@vger.kernel.org
Pull thermal fixes from Eduardo Valentin:
"A couple of fixes for the thermal subsystem"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
dt-bindings: thermal: Remove "cooling-{min|max}-level" properties
dt-bindings: thermal: remove no longer needed samsung thermal properties
- sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode
- renesas_sdhi_internal_dmac: Avoid data corruption by limiting DMA RX
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJa2YzyAAoJEP4mhCVzWIwphfoQAMAVReir4qCfLrdNsMCsbBg/
GIXJajgOYbqX1iwMEjhHAwSxIpk9i2YM/OatcCe1Kru3h0+iuEqs4XhzwbD1w/WP
BoSHKUdjocvIzUdD2N1zUf6V/1Qf74xAT9yEqWI0C8A1fhr52BbvaUgxJ9FxbjcD
OTEv2CBcyoR9lP0t32K6PzHe/uDsTGwMHa2cXit/ugN7OSM62m9Jy8lm8lcijZsA
LV9YMExoNr5T+kVijjRjzZ68z0cf3MtaMWCggO9J3yIKmkGTUYyaj6KgJWfOA+Ey
hiObIW61Rdf6npk1ne6wXlzVW1SAxMFhjQYn7zIzPs6iiqYgrNfziL1chQ130Rpl
3LE9Lfb3PIT7lHvAJQQodCYQQddh3MNf5wgnemMY5Cnbnaj50KrX+2PqzOGZsdcr
hGe4niq1pNTsgyunkkU2dG9poQjeqOS544S9/PH6VXqXhfrRvgKwJY4QQSEEykDq
3cGArh2YxfIhr/SkNDZsBegmMwW8rm00sKudm810d3q3wS7TN341qSx+no71Ryz7
qpfCZ+VFfus2irnzyV5FViPH+cZ45LUAC6NN9fcksmvEGlMwvX2yYKttSKtxi7ia
CRyf5R1rn4DJ0qgIfv+j3pc5KpBhu9EU9QFd2Zi9STs0VUKe48Jmnozck/6pvQp+
2q1d7X9S+97NJJ04CzbU
=jO/Y
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC host fixes:
- sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode
- renesas_sdhi_internal_dmac: Avoid data corruption by limiting
DMA RX"
* tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs
mmc: sdhci-pci: Only do AMD tuning for HS200
Pull MD fixes from Shaohua Li:
"Three small fixes for MD:
- md-cluster fix for faulty device from Guoqing
- writehint fix for writebehind IO for raid1 from Mariusz
- a live lock fix for interrupted recovery from Yufen"
* tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid1: copy write hint from master bio to behind bio
md/raid1: exit sync request if MD_RECOVERY_INTR is set
md-cluster: don't update recovery_offset for faulty device
When sending through SMB Direct, also dump the packet in SMB send path.
Also fixed a typo in debug message.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
This patch enhances the following things:
- tree block header
* add generation and owner output for node and leaf
- node pointer generation output
- allow btrfs_print_tree() to not follow nodes
* just like btrfs-progs
Please note that, although function btrfs_print_tree() is not called by
anyone right now, it's still a pretty useful function to debug kernel.
So that function is still kept for later use.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.
task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7
To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.
Fixes: d7eae3403f ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.
Undo this change.
Fixes: e462ec50cb ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.
Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.
The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
...
Workqueue: kafsd afs_process_async_call [kafs]
RIP: 0010:afs_find_server+0x271/0x36f [kafs]
...
Call Trace:
afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
afs_deliver_to_call+0x1ee/0x5e8 [kafs]
afs_process_async_call+0x5b/0xd0 [kafs]
process_one_work+0x2c2/0x504
worker_thread+0x1d4/0x2ac
kthread+0x11f/0x127
ret_from_fork+0x24/0x30
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Unbalanced refcounting in TIPC, from Jon Maloy.
2) Only allow TCP_MD5SIG to be set on sockets in close or listen state.
Once the connection is established it makes no sense to change this.
From Eric Dumazet.
3) Missing attribute validation in neigh_dump_table(), also from Eric
Dumazet.
4) Fix address comparisons in SCTP, from Xin Long.
5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller.
6) Fix tunnel refcounting in l2tp, from Guillaume Nault.
7) Fix double list insert in team driver, from Paolo Abeni.
8) af_vsock.ko module was accidently made unremovable, from Stefan
Hajnoczi.
9) Fix reference to freed llc_sap object in llc stack, from Cong Wang.
10) Don't assume netdevice struct is DMA'able memory in virtio_net
driver, from Michael S. Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
net/smc: fix shutdown in state SMC_LISTEN
bnxt_en: Fix memory fault in bnxt_ethtool_init()
virtio_net: sparse annotation fix
virtio_net: fix adding vids on big-endian
virtio_net: split out ctrl buffer
net: hns: Avoid action name truncation
docs: ip-sysctl.txt: fix name of some ipv6 variables
vmxnet3: fix incorrect dereference when rxvlan is disabled
llc: hold llc_sap before release_sock()
MAINTAINERS: Direct networking documentation changes to netdev
atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"
net: qmi_wwan: add Wistron Neweb D19Q1
net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
net: stmmac: Disable ACS Feature for GMAC >= 4
net: mvpp2: Fix DMA address mask size
net: change the comment of dev_mc_init
net: qualcomm: rmnet: Fix warning seen with fill_info
tun: fix vlan packet truncation
tipc: fix infinite loop when dumping link monitor summary
tipc: fix use-after-free in tipc_nametbl_stop
...
Pull vfs fixes from Al Viro:
"Assorted fixes.
Some of that is only a matter with fault injection (broken handling of
small allocation failure in various mount-related places), but the
last one is a root-triggerable stack overflow, and combined with
userns it gets really nasty ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Don't leak MNT_INTERNAL away from internal mounts
mm,vmscan: Allow preallocating memory for register_shrinker().
rpc_pipefs: fix double-dput()
orangefs_kill_sb(): deal with allocation failures
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
in the lower filesystem when filename encryption is enabled at the
eCryptfs layer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJa2LwFAAoJENaSAD2qAscKt00P+wSOy/iUn7wd3L/mtjN+B34l
8MaBNoruoZ87pPPHm4ZjB85zk5fSzwQl/jKgS8y6dRA1bxcwsUYRMooTIsiOBFeg
c1BEWfJVL8uq5AtmZ+czwUD2+DNn/U6S+XejXKr/bQyzOOgUtAeX8IPERPzGqJkB
dXbz21NLBpICxslIHTTHCpCWodm8B9SJSrBeKd1WIad1d+ITjB6fnyHyIlEyFxlk
yxkXLVZskR/yihIQ8JugJ4Li0nWz4KcdAvS2fbSIG5zgkAtLS9ypPAAQrfCf9pCA
BYc4Q2Ai425tgVqAwK2uHXOzy7/WsPL4yrIXt/04vTra1XcK9cv2EdojYi1cK04W
/a6xIWzFtp7HYEhuHZiVdvSTRXEOF9IYUXTw/gavE0KvMZ36k2FU9bLo4WKmx/91
atr+9p4vE6uA932YY54Wit++IjpLGN7FzXYxVVoACVz7HJ+sESqytPiLVX5iufux
/xrrbmlEpLEeJa6/F1Yboo9R7aclrMsxEz4Y+Txkk/zmjzP8wYYiYodOCJ23pauY
5MJruAxLtlcmAzW7FbES/Id+6Rfj42BuzXi3TRceiUv8NWmDTk3y205o94CfJW5U
mxEiFgb7Nxr1UsL66rZGmpnwcfJY4FFEyuXRGEmATBquEGJPTqhku9HKUkmORa7I
mYJzK27VX7dE2lAzuMpr
=bwS0
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Minor cleanups and a bug fix to completely ignore unencrypted
filenames in the lower filesystem when filename encryption is enabled
at the eCryptfs layer"
* tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: don't pass up plaintext names when using filename encryption
ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlrXYi4ACgkQnJ2qBz9k
QNmp5Af9HhwGDuTgY+OcVEeZzXQ30BbFQNP4cby5LZ3Wmffio8T97c2DnlDDjr2E
l8Biq5HTm3Gg5x7WEuopyrS6L4on76RDyr1Ohf34mOoIcxdcWQ7XpiyA8jeP9uLn
9IJSPYPbA2RsQk4yr8GBRlVQh/udJfSG0vQOQ5cPylgICLsfxuMIC294vFfcgDt1
wLOy24INlJYbyECMIQKrleyoZQF37Cv7v12KPU0w3eLtuNX0rdI4gwpOS4eEb27G
2KrAsRYCb5KyFl3mRXBeCPNytiJ3ffFgvFC1Z57GVWVmNnadj/Jctw3zFirsMC2d
j7IUA4XI/T+1Gql5rCOXcSgeJ0LRZA==
=KQzD
-----END PGP SIGNATURE-----
Merge tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
- isofs memory leak fix
- two fsnotify fixes of event mask handling
- udf fix of UTF-16 handling
- couple other smaller cleanups
* tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix leak of UTF-16 surrogates into encoded strings
fs: ext2: Adding new return type vm_fault_t
isofs: fix potential memory leak in mount option parsing
MAINTAINERS: add an entry for FSNOTIFY infrastructure
fsnotify: fix typo in a comment about mark->g_list
fsnotify: fix ignore mask logic in send_to_group()
isofs compress: Remove VLA usage
fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
fanotify: fix logic of events on child
Pull HID updates from Jiri Kosina:
- suspend/resume handling fix for Raydium I2C-connected touchscreen
from Aaron Ma
- protocol fixup for certain BT-connected Wacoms from Aaron Armstrong
Skomra
- battery level reporting fix on BT-connected mice from Dmitry Torokhov
- hidraw race condition fix from Rodrigo Rivas Costa
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: i2c-hid: fix inverted return value from i2c_hid_command()
HID: i2c-hid: Fix resume issue on Raydium touchscreen device
HID: wacom: bluetooth: send exit report for recent Bluetooth devices
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
HID: input: fix battery level reporting on BT mice
Pull livepatching fix from Jiri Kosina:
"Shadow variable API list_head initialization fix from Petr Mladek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Allow to call a custom callback when freeing shadow variables
livepatch: Initialize shadow variables safely by a custom callback
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrZvmsACgkQsN6d1ii/
Ey9ktQf/dOpBJOXMvQVOcZuDNtjU/GQRZTSjqPlurDVTXWqy5GKAgggqdV9PkBgO
WGd3YmzD5ctxCSCGMFub8X59qiDRzq/8lWoYMlAdi32FuhafXxNDOTraknbIq5KM
7VzjhtI9Bq9lYNxOxaAR+mKpb1ac1scOhmWsE+XZ/OKOZeSDAd1s8VOxHDPeCLPM
upQ8a1hS1+rb/FeTUjhwfuiK83Yn2+Rc7bG5zapf03unW19rSjC3z6Kjp6JSqlt6
1ALoDXbM9r2ZHSBwceagKweKx5Ca+tKaASLjxtmhmutkIRt9VxsN20KcmJ3D7eKY
EwSECAEg9ZrIONKDODpTQFhw/dEflA==
=kLT+
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- some fixes of kmalloc() flags
- one fix of the xenbus driver
- an update of the pv sound driver interface needed for a driver which
will go through the sound tree
* tag 'for-linus-4.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: xenbus_dev_frontend: Really return response string
xen/sndif: Sync up with the canonical definition in Xen
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe
Fix an off-by-one bug in our alternative asm patching which leads to incorrectly
patched code. This bug lay dormant for nearly 10 years but we finally hit it
due to a recent change.
Fix lockups when running KVM guests on Power8 due to a missing check when a
thread that's running KVM comes out of idle.
Fix an out-of-spec behaviour in the XIVE code (P9 interrupt controller).
Fix EEH handling of bridge MMIO windows.
Prevent crashes in our RFI fallback flush handler if firmware didn't tell us the
size of the L1 cache (only seen on simulators).
Thanks to:
Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJa2eo9ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCb
AxAAqXGKeXg1ZJw+OHR8pteDYLGylVTRhWZykO3rYkJ7DXtcb9Qr17VOyitVZ93y
uNbu8Nm4KMqKeOYijqyXjK9kjVx47K3V9p95CBDy5yA4hGrV+I/YIZUqANwRZNat
1Bn+eGlyLBGNRlpZ/CnFTm/W/LH0fDJV8tmecXmegGRwcHU7M0TT8t9XgRT1iX04
2uUhBFCltwSmLiFJTOth8VbppTCzf2mZd1GsBM1lM/pwCd5j1LZO9lHoY0wTEddj
v+lsCzp2IGO0G8xPFEg+jDa/dOZ1y86L8feir7/TRkwe73EesSux7PN7tWpWA1dE
iDJ4fdHiBL/PIB6XjFcCUsED2MaLO/01DpqRfM79HSXlxtQQBN1HJYmIALowwFLm
UTGOEkXTv/V43TQrAAcFSMZJ6A+r8qsfkA+ROuohG5mTr8xhJ4yLh7YbptSZOWZk
zwC58UAFEhKZHrZO3Vt8DFR0NSbkURfDLEgCszheBESqRh3HmiLYxo7UrBvfkL8G
rXUSbH7i+4jsr7ehdkoEAoGvq0f9jSmBxQ0h4UmZW1TyTL6Q/EdRbwA5PBtgin6m
x/viq3Im9l0IRrMlNUT/WhjMwzPbp1GaTiiMjCaxEfCQhoP1Dfb94lsL4CXaXfHa
rNzXmNRRM2fDAVINYLOK39UjZdnrONwxznI6KmBWkvHJLYA=
=Ep0t
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix an off-by-one bug in our alternative asm patching which leads to
incorrectly patched code. This bug lay dormant for nearly 10 years
but we finally hit it due to a recent change.
- Fix lockups when running KVM guests on Power8 due to a missing check
when a thread that's running KVM comes out of idle.
- Fix an out-of-spec behaviour in the XIVE code (P9 interrupt
controller).
- Fix EEH handling of bridge MMIO windows.
- Prevent crashes in our RFI fallback flush handler if firmware didn't
tell us the size of the L1 cache (only seen on simulators).
Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling.
* tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kvm: Fix lockups when running KVM guests on Power8
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/xive: Fix trying to "push" an already active pool VP
powerpc/64s: Default l1d_size to 64K in RFI fallback flush
powerpc/lib: Fix off-by-one in alternate feature patching
Pull s390 fixes and kexec-file-load from Martin Schwidefsky:
"After the common code kexec patches went in via Andrew we can now push
the architecture parts to implement the kexec-file-load system call.
Plus a few more bug fixes and cleanups, this includes an update to the
default configurations"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/signal: cleanup uapi struct sigaction
s390: rename default_defconfig to debug_defconfig
s390: remove gcov defconfig
s390: update defconfig
s390: add support for IBM z14 Model ZR1
s390: remove couple of duplicate includes
s390/boot: remove unused COMPILE_VERSION and ccflags-y
s390/nospec: include cpu.h
s390/decompressor: Ignore file vmlinux.bin.full
s390/kexec_file: add generated files to .gitignore
s390/Kconfig: Move kexec config options to "Processor type and features"
s390/kexec_file: Add ELF loader
s390/kexec_file: Add crash support to image loader
s390/kexec_file: Add image loader
s390/kexec_file: Add kexec_file_load system call
s390/kexec_file: Add purgatory
s390/kexec_file: Prepare setup.h for kexec_file_load
s390/smsgiucv: disable SMSG on module unload
s390/sclp: avoid potential usage of uninitialized value
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally
despite the fact that there are no SBOXes available.
Check the Power Control Unit CAPID4 register to determine the number of
available SBOXes on the particular CPU before trying to enable them. If
there are none, nullify the SBOX descriptor so it isn't tried to be
initialized.
Signed-off-by: Oskar Senft <osk@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark van Dijk <mark@voidzero.net>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
This reverts commit 3b94a89166 ("perf/x86/intel/uncore: Remove
SBOX support for Broadwell server")
Revert because there exists a proper workaround for Broadwell-EP servers
without SBOX now. Note that BDX-DE does not have a SBOX.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: osk@google.com
Cc: mark@voidzero.net
Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
On a system with 4-level page-tables there is no p4d, so the pud in the pgd
should be mapped. The old code before commit fb43d6cb91 already did that.
The change from above commit causes an invalid page-table which causes
undefined behavior. In one report it caused triple faults.
Fix it by changing the p4d back to pud.
Fixes: fb43d6cb91 ('x86/mm: Do not auto-massage page protections')
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
Cc: hpa@zytor.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1524162360-26179-1-git-send-email-joro@8bytes.org
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.
Cc: stable@kernel.org
Reported-by: Alexander Aring <aring@mojatatu.com>
Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Adding additional maintainers to libnvdimm related code and DAX.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
MAP_SYNC is a nop for device-dax. Allow MAP_SYNC to succeed on device-dax
to eliminate special casing between device-dax and fs-dax as to when the
flag can be specified. Device-dax users already implicitly assume that they do
not need to call fsync(), and this enables them to explicitly check for this
capability.
Cc: <stable@vger.kernel.org>
Fixes: b6fb293f24 ("mm: Define MAP_SYNC and VM_SYNC flags")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With commit df3f126482 ("libnvdimm, of_pmem: use dev_to_node() instead
of of_node_to_nid()") it is now possible to allow of_pmem to be built as
a module as originally implemented.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the direct dependency on of_node_to_nid() by using dev_to_node()
instead. Any DT platform device will have its NUMA node id set when the
device is created.
With this, commit 291717b6fb ("libnvdimm, of_pmem: workaround OF_NUMA=n
build error") can be reverted.
Fixes: 7171976089 ("libnvdimm: Add device-tree based driver")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
crashes, because
commit 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
releases the internal clcsock in smc_close_active() and sets smc->clcsock
to NULL.
For SHUT_RD the smc_close_active() call is removed.
For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
clcsock is already released.
Fixes: 127f497058 ("net/smc: release clcsock from tcp_listen_worker")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type
could be greater than the fixed buffer length of 4096 bytes allocated by
the driver. This was causing HWRM_NVM_READ to copy more data to the buffer
than the allocated size, causing general protection fault.
Fix the issue by allocating the exact buffer length returned by
HWRM_NVM_FIND_DIR_ENTRY, instead of 4096. Move the kzalloc() call
into the bnxt_get_pkgver() function.
Fixes: 3ebf6f0a09 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin says:
====================
virtio: ctrl buffer fixes
Here are a couple of fixes related to the virtio control buffer.
Lightly tested on x86 only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
offloads is a buffer in virtio format, should use
the __virtio64 tag.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Programming vids (adding or removing them) still passes
guest-endian values in the DMA buffer. That's wrong
if guest is big-endian and when virtio 1 is enabled.
Note: this is on top of a previous patch:
virtio_net: split out ctrl buffer
Fixes: 9465a7a6f ("virtio_net: enable v1.0 support")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending control commands, virtio net sets up several buffers for
DMA. The buffers are all part of the net device which means it's
actually allocated by kvmalloc so it's in theory (on extreme memory
pressure) possible to get a vmalloc'ed buffer which on some platforms
means we can't DMA there.
Fix up by moving the DMA buffers into a separate structure.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When longer interface names are used, the action names exposed in
/proc/interrupts and /proc/irq/* maybe truncated. For example, when
using the predictable name algorithm in systemd on a HiSilicon D05,
I see:
ubuntu@d05-3:~$ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
enahisic2i0-tx0
enahisic2i0-tx1
[...]
enahisic2i0-tx8
enahisic2i0-tx9
enahisic2i0-tx1
enahisic2i0-tx1
enahisic2i0-tx1
enahisic2i0-tx1
enahisic2i0-tx1
enahisic2i0-tx1
Increase the max ring name length to allow for an interface name
of IFNAMSIZE. After this change, I now see:
$ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //'
enahisic2i0-tx0
enahisic2i0-tx1
enahisic2i0-tx2
[...]
enahisic2i0-tx8
enahisic2i0-tx9
enahisic2i0-tx10
enahisic2i0-tx11
enahisic2i0-tx12
enahisic2i0-tx13
enahisic2i0-tx14
enahisic2i0-tx15
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name of the following proc/sysctl entries were incorrectly
documented:
/proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
/proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
/proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
/proc/sys/net/ipv6/conf/<interface>/max_hbt_length
Their name was set to the name of the symbol in the .data field of the
control table instead of their .proc name.
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.
This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Acked-by: Louis Luo <llouis@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported we still access llc->sap in llc_backlog_rcv()
after it is freed in llc_sap_remove_socket():
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
llc_conn_service net/llc/llc_conn.c:400 [inline]
llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x12f/0x3a0 net/core/sock.c:2335
release_sock+0xa4/0x2b0 net/core/sock.c:2850
llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
llc->sap is refcount'ed and llc_sap_remove_socket() is paired
with llc_sap_add_socket(). This can be amended by holding its refcount
before llc_sap_remove_socket() and releasing it after release_sock().
Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Networking docs changes go through the networking tree, so patch the
MAINTAINERS file to direct authors to the right place.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACS Feature is currently enabled for GMAC >= 4 but the llc_snap status
is never checked in descriptor rx_status callback. This will cause
stmmac to always strip packets even that ACS feature is already
stripping them.
Lets be safe and disable the ACS feature for GMAC >= 4 and always strip
the packets for this GMAC version.
Fixes: 477286b53f ("stmmac: add GMAC4 core support")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were
used (GENMASK_ULL(40, 0)).
This commit fixes that by using the correct mask.
Fixes: e7c5359f2e ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comment of dev_mc_init() is wrong. which use dev_mc_flush
instead of dev_mc_init.
Signed-off-by: Lianwen Sun <sunlw.fnst@cn.fujitsu.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Some rawmidi compat ioctls lack of the input substream checks
(although they do check only for rfile->output). This many eventually
lead to an Oops as NULL substream is passed to the rawmidi core
functions.
Fix it by adding the proper checks before each function call.
The bug was spotted by syzkaller.
Reported-by: syzbot+f7a0348affc3b67bc617@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>