Commit Graph

738946 Commits

Author SHA1 Message Date
Wanpeng Li
518e7b9481 KVM: X86: Allow userspace to define the microcode version
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.

By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.

Suggested-by: Filippo Sironi <sironi@amazon.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 22:32:44 +01:00
Wanpeng Li
66421c1ec3 KVM: X86: Introduce kvm_get_msr_feature()
Introduce kvm_get_msr_feature() to handle the msrs which are supported
by different vendors and sharing the same emulation logic.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 22:32:44 +01:00
Linus Torvalds
8da5db7dda platform-drivers-x86 for v4.16-5
Fix a regression on laptops like Dell XPS 9360 where keyboard stopped
 working.
 
 Correct sysfs wakeup attribute after removal of some drivers to reflect
 that they are not able to wake system up anymore.
 
 The following is an automated git shortlog grouped by driver:
 
 intel-hid:
  -  Reset wakeup capable flag on removal
 
 intel-vbtn:
  -  Reset wakeup capable flag on removal
  -  Only activate tablet mode switch on 2-in-1's
 
 wmi:
  -  Fix misuse of vsprintf extension %pULL
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEhiZOUlnC9oKN3n3AmT3/83c5Sy0FAlqYQ3QACgkQmT3/83c5
 Sy3gUw//TnBvy5Ts4Wu4HZgtXp/qAaMKpIcZzuB8P4x+4jKvfqIQ3z9S1XqhLbT1
 aXflqVRoJmp6zLyQG4QuoVcKfEA/wRhBtvYGhzE3/DDrkr/3vu35guRGYUtkRfJZ
 bWz+JvPFcExh7y5rNxNnddC6l/UCj+Ptb2MNeASFzFXI4ggyCjJpiCjnteEInnJa
 jy42nLaNKR0oljWvnb9NJJJtbd91JoapY4e3jbs3UXEYnRLcAuZCy8qC7ls1BQ1y
 h23lPQZMp54KHuYCyQ2Huudg7pK8oH3qOZGN0vFYq/D8W13bq8Dx6QhXRK0gAprt
 pxH4gBTGUutck08YZua+8shMRU1oxrHQAwq/z7NC+IMd/joQOlmdUcHXQtCUF33Z
 6H7OkbtSrOJkRETPgelWxQ1bTtaqxd8CwHSevjBXqAIFHGAFH8hnjmQN/3e9Ynpz
 okKqkMut0WkCXF1sO/4tQIu13oJMrEfInZujn0t99kDH7ecMBlYqhxgHIzwnKDuS
 xCWGy/c2ejC/zndLktm/EyacCeFgKEhbGPHcM+mjtGeDvHpQ0huyBTS0CLuIEBQs
 9RDe+A28bV4Qi6B8GB6AzyShzwRx/8ogO8aOeDAtrjhMEWSTh5VcrgHe7NRUt0jC
 WGoLTe0NajpGVzf4XjzgdzeyCSgdN+LyH4nPlvi8AvNbrYtMAhc=
 =ezh3
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform drivers fixes from Andy Shevchenko:

 - fix a regression on laptops like Dell XPS 9360 where keyboard stopped
   working.

 - correct sysfs wakeup attribute after removal of some drivers to
   reflect that they are not able to wake system up anymore.

* tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: wmi: Fix misuse of vsprintf extension %pULL
  platform/x86: intel-hid: Reset wakeup capable flag on removal
  platform/x86: intel-vbtn: Reset wakeup capable flag on removal
  platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's
2018-03-01 10:50:01 -08:00
Sabrina Dubroca
f1c02cfb7b ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses
According to RFC 4429 (section 3.1), adding new IPv6 addresses as
optimistic addresses is acceptable, as long as the implementation
follows some rules:

   * Optimistic DAD SHOULD only be used when the implementation is aware
        that the address is based on a most likely unique interface
        identifier (such as in [RFC2464]), generated randomly [RFC3041],
        or by a well-distributed hash function [RFC3972] or assigned by
        Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315].
        Optimistic DAD SHOULD NOT be used for manually entered
        addresses.

Thus, it seems reasonable to allow userspace to set the optimistic flag
when adding new addresses.

We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is
not performing DAD we would never clear the optimistic flag. We must
also ignore userspace's request to add OPTIMISTIC flag to addresses that
have already completed DAD (addresses that don't have the TENTATIVE
flag, or that have the DADFAILED flag).

Then we also need to clear the OPTIMISTIC flag on permanent addresses
when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace
can still be used after DAD has failed, because in
ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE.

Setting IFA_F_OPTIMISTIC from userspace is conditional on
CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:43:06 -05:00
Gal Pressman
3a053b1a30 net: Fix spelling mistake "greater then" -> "greater than"
Fix trivial spelling mistake "greater then" -> "greater than".

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:39:39 -05:00
Arnd Bergmann
773daa3caf net: ipv4: avoid unused variable warning for sysctl
The newly introudced ip_min_valid_pmtu variable is only used when
CONFIG_SYSCTL is set:

net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable]

This moves it to the other variables like it, to avoid the harmless
warning.

Fixes: c7272c2f12 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:38:22 -05:00
Richard Cochran
e31a6f9067 net: phylink: Remove redundant netdev.phydev assignment
As a part of working on MII time stamping infrastructure, I was trying
to figure out how netdev->phydev gets assigned, and I stumbled across
this.  Ever since the new phylink code came in, the field is assigned
twice.

The function, phylink_connect_phy(), calls

	phy_attach_direct()
	phylink_bringup_phy()

and phy_attach_direct() sets

	dev->phydev = phydev;

but phylink_bringup_phy() then sets the same field again:

	pl->netdev->phydev = phy;

Similarly, the function, phylink_of_phy_connect(), calls

	of_phy_attach()
		phy_attach_direct()
	phylink_bringup_phy()

The removal code is also duplicated:

phylink_disconnect_phy()
	pl->netdev->phydev = NULL;
	phy_disconnect()
		phy_detach()
			phydev->attached_dev->phydev = NULL;

This patch removes the redundant assignments, restricting manipulation
of the netdev.phydev field to phy_attach_direct() and phy_detach().

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:26:14 -05:00
David S. Miller
ba6078081c Merge branch 'smc-link-layer-control-enhancements'
Ursula Braun says:

====================
net/smc: Link Layer Control enhancements

here is a series of smc patches enabling SMC communication with peers
supporting more than one link per link group.

The first three patches are preparing code cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:32 -05:00
Karsten Graul
9651b9346f net/smc: prevent new connections on link group
When the processing of a DELETE LINK message has started,
new connections should not be added to the link group that
is about to terminate.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
52bedf37ba net/smc: process add/delete link messages
Add initial support for the LLC messages ADD LINK and DELETE LINK.
Introduce a link state field. Extend the initial LLC handshake with
ADD LINK processing.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
75d320d611 net/smc: do not allow eyecatchers in rmbe
SMC does not support eyecatchers in RMB elements,
decline peers requesting this support.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
4ed75de58e net/smc: process confirm/delete rkey messages
Process and respond to CONFIRM RKEY and DELETE RKEY messages.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
313164da55 net/smc: respond to test link messages
Add TEST LINK message responses, which also serves as preparation for
support of sockopt TCP_KEEPALIVE.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
be6d467b99 net/smc: remove unused fields from smc structures
The daddr field holds the destination IPv4 address. The field was set but
never used and can be removed. The addr field was a left-over from an
earlier version of non-blocking connects and can be removed.
The result of the call to kernel_getpeername is not used, the call can be
removed. Non-blocking connects are working, so remove restriction comment.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
696cd30169 net/smc: move netinfo function to file smc_clc.c
The function smc_netinfo_by_tcpsk() belongs to CLC handling.
Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Stefan Raspl
0f6271264a net/smc: cleanup smc_llc.h and smc_clc.h headers
Remove structures used internal only from headers.
And remove an extra function parameter.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
David S. Miller
3c5aa0bc9c Merge branch 'ipv4-ipv6-mcast-align'
Yuval Mintz says:

====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6

Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.

This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.

The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.

The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.

This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.

Changes from previous versions
------------------------------
v2:
  - #6 Corrected reporting logic when hitting an unresolved cache
  - #7 Addressed kernel doc style [Thanks Nikolay]

RFC -> v1:
  - Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
  - Addressed a couple of kbuild test robot issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:24 -05:00
Yuval Mintz
7b0db85737 ipmr, ip6mr: Unite dumproute flows
The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.

Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
889cd83cbe ip6mr: Remove MFC_NOTIFY and refactor flags
MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit
06bd6c0370 ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
3feda6b46f ipmr, ip6mr: Unite vif seq functions
Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
c8d6196803 ipmr, ip6mr: Unite mfc seq logic
With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
845c9a7ae7 ipmr, ip6mr: Unite logic for searching in MFC cache
ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.

In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
494fff5637 ipmr, ip6mr: Make mfc_cache a common structure
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic  - mr_mfc
and convert both ipmr and ip6mr into using it.

For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
0bbbf0e7d0 ipmr, ip6mr: Unite creation of new mr_table
Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
b70432f731 mroute*: Make mr_table a common struct
Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
87c418bf13 ip6mr: Align hash implementation to ipmr
Since commit 8fb472c09b ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.

Align ip6mr to the current ipmr implementation.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
8571ab479a ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
6853f21f76 ipmr,ipmr6: Define a uniform vif_device
The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.

Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Linus Torvalds
7e30309968 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD bugfixes from Shaohua Li:

 - fix raid5-ppl flush request handling hang from Artur

 - fix a potential deadlock in raid5/10 reshape from BingJing

 - fix a deadlock for dm-raid from Heinz

 - fix two md-cluster of raid10 from Lidong and Guoqing

 - fix a NULL deference problem in device removal from Neil

 - fix a NULL deference problem in raid1/raid10 in specific condition
   from Yufen

 - other cleanup and fixes

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/raid1: fix NULL pointer dereference
  md: fix a potential deadlock of raid5/raid10 reshape
  md-cluster: choose correct label when clustered layout is not supported
  md: raid5: avoid string overflow warning
  raid5-ppl: fix handling flush requests
  md raid10: fix NULL deference in handle_write_completed()
  md: only allow remove_and_add_spares when no sync_thread running.
  md: document lifetime of internal rdev pointer.
  md: fix md_write_start() deadlock w/o metadata devices
  MD: Free bioset when md_run fails
  raid10: change the size of resync window for clustered raid
  md-multipath: Use seq_putc() in multipath_status()
  md/raid1: Fix trailing semicolon
  md/raid5: simplify uninitialization of shrinker
2018-03-01 10:08:47 -08:00
Linus Torvalds
7bec4a9646 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
 "Make sure that we wake up userspace loggers. This fixes a race
  introduced by the console waiter logic during this merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: Wake klogd when passing console_lock owner
2018-03-01 10:06:39 -08:00
Joe Perches
1cedc6385d platform/x86: wmi: Fix misuse of vsprintf extension %pULL
%pULL doesn't officially exist but %pUL does.

Miscellanea:

o Add missing newlines to a couple logging messages

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-03-01 10:01:39 -08:00
Tom Lendacky
d1d93fa90f KVM: SVM: Add MSR-based feature support for serializing LFENCE
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked.  This patch will add support to allow a guest to
properly make this determination.

Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features.  If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see.  Support is also added to write (hypervisor only)
and read the MSR value for the guest.  A write by the guest will result in
a #GP.  A read by the guest will return the value as set by the host.  In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 19:00:28 +01:00
Tom Lendacky
801e459a6f KVM: x86: Add a framework for supporting MSR-based features
Provide a new KVM capability that allows bits within MSRs to be recognized
as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 19:00:28 +01:00
Linus Torvalds
16453c9cf8 sound fixes for 4.16-rc4
The only core change is the fix for possible memory corruption by
 ALSA ctl API since 4.14 kernel due to a thinko.  The rest are all
 device-specific: in addition to the usual suspects (HD-audio and
 USB-audio fixups), a few LPE HDMI audio fixes came in at this time.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlqXtNsOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE86Lw//d+e0ikj+Mjb2hovsst6drAXP3WvgtTSzSA9i
 fKUjPYUErvCUK+jM7p0p06FOxkMR4/nBEUFSzjoE52LAMxYBMFtW5aZjewc4JA2X
 nhhO0YMqm6tvaVmYFuGl6MZ4SqN+428Nmn5NaEDu8RymQVLkQLS0dXz7h8Z9wHg0
 z3EaYAFtH28gV3eip5KM3yhfvLtcuMdXm9Z+qcl0XAtF04MU/Wj8/dDnYFOjYVXn
 DhlQdrH4nFeRKPWYhZMkCnfSbIY1JLkAC1W18bYTzBaxuwTSCL3I78qaeSfjiIZy
 8ld/bPq4FcY2HndLZzw78No4qn1r1TJR0A0VDZ6RsprbmXSB5PVgFFnq5PHUg/G5
 a4e8yn7cfsNjiZisu15LrAeS1+NbjVMQbo3y7APfSNmWJJLMOXoo5YxJJawoDLlP
 NvQzP/MtUupWWxjaLTBIFU58/gLJBsNk+YEq36Ggynecb6PwwmzUbSSk6XHHgqFy
 82gXfd7VLXOCs6Y6h/l1BD2uQdxE01ziW6jbSlJ+20nJaqodpFmVQvzzTakMMipi
 WpuC+SVJsjLdTBVyyOdQHv/SHYsv1nGUoC21q/5IdhjLF23uBKzM8hoDoUJE6mCa
 CadWCwHVdGjkRtsyJAGy6kjru2bHjnboxkYxX9tF+qbdi0HmLSJRSRjq1e/TTIsf
 atVBiuw=
 =UY/c
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "The only core change is the fix for possible memory corruption by ALSA
  ctl API since 4.14 kernel due to a thinko.

  The rest are all device-specific: in addition to the usual suspects
  (HD-audio and USB-audio fixups), a few LPE HDMI audio fixes came in at
  this time"

* tag 'sound-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: x86: Fix potential crash at error path
  ALSA: x86: Fix missing spinlock and mutex initializations
  ALSA: control: Fix memory corruption risk in snd_ctl_elem_read
  ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
  ALSA: usb-audio: Add a quirck for B&W PX headphones
  ALSA: hda: Add a power_save blacklist
  ALSA: x86: hdmi: Add single_port option for compatible behavior
2018-03-01 08:31:23 -08:00
Linus Torvalds
44896cd1de Pin control fixes for v4.16:
- Fix a pin group on the Meson.
 - Assign maintainers for Freescale/NXP pin controllers.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJal7MkAAoJEEEQszewGV1zfW8QAJbeMR/FZCmufgSaeHi1HuTQ
 Q6naMUT+23zaBjFwHarXjvlH2YxGqGDRu22fQxn44Ruxh4utCG9dkp800Si3Rzit
 FmIiKBBRw/MnjDmmWMq+SkpaAzseCXg97us6beFk9ZoO+/t/JrZenuVgk/XQtrv9
 7Fy98OvU0nXr366Jn2ZI8q4g57XpRbGX+2x/DIs68B7wOLBQpVl/OId4n+foyVmR
 VxRzGS9f6fnXK4BxSUGPzdcCdbRDLZIYxuLXJoAzQDAAEbby7/qKyPHeIsZYWBBW
 Jsn1lun/tcPVFFhYR6Fo5w/45UW7+rjfm4MFR3zPtZfEusflMd9byBp7+PPoV1X4
 CQ4TB3yW5+0I/xjZVPgU6UcbnZWTUGeFy8LZ4NIzTb9I8hxGrkoVlHZtft5syFtD
 kZViwRG5a1txWH+9/5yJnLg0KKS3+rh2Knn/GJTV4NdJcj/18Ye4NEueXIcJzYlX
 947aSnBJpMfOX0I6tBg0VW8tb2js2ROHLg5sOMbGAaCyr3KfZkOnCze7lWSV0VmS
 v8K9zBtUjMFPddXG6O4EvwYOax0W5bnFfJ6MmcEnGxCKdEtUShaJNRiyPzGYei0O
 s/znLxQmhV4IAKt6AQ/bLlKeNpvoeC7THBBjhjKhn5j+QNhBqtUVvVrUh/tv2DEQ
 t0XrzQMKSYcbfiPKQMI9
 =jVcD
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Two smallish pin control fixes: one actual code fix for the Meson and
  a MAINTAINERS update.

  Summary:

   - fix a pin group on the Meson

   - assign maintainers for Freescale/NXP pin controllers"

* tag 'pinctrl-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  MAINTAINERS: add Freescale pin controllers
  pinctrl: meson-axg: adjust uart_ao_b pin group naming
2018-03-01 08:19:10 -08:00
Linus Torvalds
f902a778df GPIO fixes for v4.16:
- Fix up device tree properties readout caused by my own
   refactorings.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJal7CpAAoJEEEQszewGV1zij8QALoTOgsX0x1RHRzJBPIVcoGL
 QBQJKBFm1dKwAHNrhgzY9rF1LNnsBN0LQ00PUiKJSPbsc8aa9WZw9XbhtXx1ICRK
 uEOytwAwkBNd//O1en3ohteg5ihszmG/mr0DwGF62AM1wstmy+4ewciLFxBIC/ue
 a9Wf1k1abDriqFjmwQS/TWN5YmaHOCKtrt6eSNE46YkKAEGFsPEfe64P08kg6Y1m
 UBxLJg8HBUMVl2lA6UQwrnrCrY4PQR6Jl3j5qaY3ENMtnr3EYFvQenwzHunDWu51
 dR43gg+isjVsANomB1+yrwB6yS7Xhrz/uQwkdeMR9Gq06uq39B82/L96jHr9Xpvk
 OjKLOP2j/hoAQ+P6eoVGZcm522F6itpavaCdV/78wm6BEetvZWavXsREyb8D8i/i
 8Xmc65Gn2+tOTBbLp5QDHabxhJnSR5xc3d2q5rDx2RjZBS4H+OZvvoYm1jgo2gOg
 y9ccAaaLMitQ3dBpuPvidn159nBC3hcZIiGXaCXJKdclxQptIcMCaO4cZuxIF1bU
 4gFo/cn4cr/TAL929UZFtgqfkAxdLaiJXQs9mzLC0EqWy89aaB7XFxI22JlYlCFH
 pLJEpEqY3b6rTpIrNaH5Agbwk+BkamMSYx7jMuK+cw/8AbKSc78KVIJfUGZomXUb
 B9qXuyzuaS3Bg8kFYByX
 =XcVM
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Fix up device tree properties readout caused by my own refactorings"

* tag 'gpio-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Handle deferred probing in of_find_gpio() properly
  gpiolib: Keep returning EPROBE_DEFER when we should
2018-03-01 08:17:01 -08:00
Jiufei Xue
158e61865a block: fix a typo
Fix a typo in pkt_start_recovery.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:41:27 -07:00
Jiufei Xue
9c0fb1e313 block: display the correct diskname for bio
bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:41:25 -07:00
Jiufei Xue
7c5a0dcf55 block: fix the count of PGPGOUT for WRITE_SAME
The vm counters is counted in sectors, so we should do the conversation
in submit_bio.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:41:23 -07:00
Chengguang Xu
1c78924957 ceph: fix potential memory leak in init_caches()
There is lack of cache destroy operation for ceph_file_cachep
when failing from fscache register.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-03-01 16:39:47 +01:00
Damien Le Moal
f3bc78d2d4 mq-deadline: Make sure to always unlock zones
In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-deadline .completed_request() method is not called. This results in
the target zone of the failed write request being left in a locked
state, preventing that any new write requests are issued to the same
zone.

Fix this by replacing the .completed_request() method with the
.finish_request() method as this method is always called whether or
not a request completes successfully. Since the .finish_request()
method is only called by the blk-mq core if a .prepare_request()
method exists, add a dummy .prepare_request() method.

Fixes: 5700f69178 ("mq-deadline: Introduce zone locking support")
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[ bvanassche: edited patch description ]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:39:24 -07:00
Masahiro Yamada
cd81fc82b9 kconfig: add xstrdup() helper
We already have xmalloc(), xcalloc(), and xrealloc(().  Add xstrdup()
as well to save tedious error handling.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-02 00:26:47 +09:00
Luc Van Oostenryck
6c49f359ca kbuild: disable sparse warnings about unknown attributes
Currently, sparse issues warnings on code using an attribute
it doesn't know about.

One of the problem with this is that these warnings have no
value for the developer, it's just noise for him. At best these
warnings tell something about some deficiencies of sparse itself
but not about a potential problem with code analyzed.

A second problem with this is that sparse release are, alas,
less frequent than new attributes are added to GCC.

So, avoid the noise by asking sparse to not warn about
attributes it doesn't know about.

Reference: https://marc.info/?l=linux-sparse&m=151871600016790
Reference: https://marc.info/?l=linux-sparse&m=151871725417322
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-02 00:26:46 +09:00
Ulf Magnusson
61277981dd Makefile: Fix lying comment re. silentoldconfig
The comment above the silentoldconfig invocation is outdated.
'make oldconfig' updates just .config and doesn't touch the
include/config/ tree.

This came up in https://lkml.org/lkml/2018/2/12/415.

While fixing the comment, make it more informative by explaining the
purpose of the unfortunately named silentoldconfig.

I can't make sense of the comment re. auto.conf.cmd and a cleaned tree.
include/config/auto.conf and include/config/auto.conf.cmd are both
created simultaneously by silentoldconfig (in
scripts/kconfig/confdata.c, by conf_write_autoconf()), and nothing seems
to remove auto.conf.cmd that wouldn't remove auto.conf. Remove that part
of the comment rather than blindly copying it. It might be a leftover
from an older way of doing things.

The include/config/auto.conf.cmd prerequisite might be there to ensure
that silentoldconfig gets rerun if conf_write_autoconf() fails between
writing out auto.conf.cmd and auto.conf (a comment in the function
indicates that auto.conf is deliberately written out last to mark
completion of the operation). It seems the Makefile dependency between
include/config/auto.conf and .config would already take care of that
though, since include/config/auto.conf would still be out of date re.
.config if the operation fails.

Cop out and leave the prerequisite in for now.

Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-02 00:26:46 +09:00
Filipe Manana
1f250e929a Btrfs: fix log replay failure after unlink and link combination
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/testdir
  $ touch /mnt/testdir/foo
  $ ln /mnt/testdir/foo /mnt/testdir/bar

  $ sync

  $ unlink /mnt/testdir/bar
  $ touch /mnt/testdir/bar
  $ xfs_io -c "fsync" /mnt/testdir/bar

  <power failure>

  $ mount /dev/sdb /mnt
  mount: mount(2) failed: /mnt: No such file or directory

When replaying the log, for that example, we also see the following in
dmesg/syslog:

  [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
  [71813.674204] ------------[ cut here ]------------
  [71813.675694] BTRFS: Transaction aborted (error -2)
  [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
  [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        4.15.0-rc9-btrfs-next-56+ #1
  [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
  [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
  [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
  [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
  [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
  [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
  [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
  [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
  [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
  [71813.679669] Call Trace:
  [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
  [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
  [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
  [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
  [71813.679669]  ? rcu_read_unlock+0x3a/0x57
  [71813.679669]  ? __lock_is_held+0x39/0x71
  [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
  [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
  [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
  [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
  [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
  [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
  [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
  [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
  [71813.679669]  ? mount_fs+0x64/0x10b
  [71813.679669]  mount_fs+0x64/0x10b
  [71813.679669]  vfs_kern_mount+0x68/0xce
  [71813.679669]  do_mount+0x6e5/0x973
  [71813.679669]  ? memdup_user+0x3e/0x5c
  [71813.679669]  SyS_mount+0x72/0x98
  [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
  [71813.679669] RIP: 0033:0x7f7cedf150ba
  [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
  [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
  [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
  [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
  [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
  [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
  [71814.128078] BTRFS error (device dm-0): open_ctree failed

This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.

Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:18:40 +01:00
Filipe Manana
9a6509c4da Btrfs: fix log replay failure after linking special file and fsync
If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an EEXIST error is returned and mounting
the filesystem fails. Example scenario:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt
  $ mkdir /mnt/testdir
  $ mkfifo /mnt/testdir/foo
  # Make sure everything done so far is durably persisted.
  $ sync

  # Create some unrelated file and fsync it, this is just to create a log
  # tree. The file must be in the same directory as our special file.
  $ touch /mnt/testdir/f1
  $ xfs_io -c "fsync" /mnt/testdir/f1

  # Rename our special file and then create a hard link with its old name.
  $ mv /mnt/testdir/foo /mnt/testdir/bar
  $ ln /mnt/testdir/bar /mnt/testdir/foo

  # Create some other unrelated file and fsync it, this is just to persist
  # the log tree which was modified by the previous rename and link
  # operations. Alternatively we could have modified file f1 and fsync it.
  $ touch /mnt/f2
  $ xfs_io -c "fsync" /mnt/f2

  <power failure>

  $ mount /dev/sdc /mnt
  mount: mount /dev/sdc on /mnt failed: File exists

This happens because when both the log tree and the subvolume's tree have
an entry in the directory "testdir" with the same name, that is, there
is one key (258 INODE_REF 257) in the subvolume tree and another one in
the log tree (where 258 is the inode number of our special file and 257
is the inode for directory "testdir"). Only the data of those two keys
differs, in the subvolume tree the index field for inode reference has
a value of 3 while the log tree it has a value of 5. Because the same key
exists in both trees, but have different index, the log replay fails with
an -EEXIST error when attempting to replay the inode reference from the
log tree.

Fix this by setting the last_unlink_trans field of the inode (our special
file) to the current transaction id when a hard link is created, as this
forces logging the parent directory inode, solving the conflict at log
replay time.

A new generic test case for fstests was also submitted.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:18:34 +01:00
Filipe Manana
d4dfc0f4d3 Btrfs: send, fix issuing write op when processing hole in no data mode
When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.

Trivial reproducer:

  $ mkfs.btrfs -f -O no-holes /dev/sdc
  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdc /mnt/sdc
  $ mount /dev/sdd /mnt/sdd

  $ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
  $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1

  $ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
  $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2

  $ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
  $ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
       | btrfs receive -vv /mnt/sdd

Before this change the output of the second receive command is:

  receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447...
  utimes
  write foobar, offset 8192, len 8192
  utimes foobar
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-...

After this change it is:

  receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
  utimes
  update_extent foobar: offset=8192, len=8192
  utimes foobar
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:18:07 +01:00
Anand Jain
3c181c12c4 btrfs: use proper endianness accessors for super_copy
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value.  This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

CC: stable@vger.kernel.org
Fixes: df93589a17 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:17:27 +01:00
Hans van Kranenburg
92e222df7b btrfs: alloc_chunk: fix DUP stripe size handling
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.

The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.

Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.

The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.

Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.

Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.

Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.

The real problem was already introduced with the rest of the code in
73c5de0051.

The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:16:47 +01:00
Nikolay Borisov
765f3cebff btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:16:12 +01:00