A remount to a read-write filesystem is not safe when there's tree-log
to be replayed. Files that could be opened until now might be affected
by the changes in the tree-log.
A regular mount is needed to replay the log so the filesystem presents
the consistent view with the pending changes included.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's no logged information about tree-log replay although this is
something that points to previous unclean unmount. Other filesystems
report that as well.
Suggested-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a few cases where we allow an extent map that is in an extent map
tree to be merged with other extents in the tree. Such cases include the
unpinning of an extent after the respective ordered extent completed or
after logging an extent during a fast fsync. This can lead to subtle and
dangerous problems because when doing the merge some other task might be
using the same extent map and as consequence see an inconsistent state of
the extent map - for example sees the new length but has seen the old start
offset.
With luck this triggers a BUG_ON(), and not some silent bug, such as the
following one in __do_readpage():
$ cat -n fs/btrfs/extent_io.c
3061 static int __do_readpage(struct extent_io_tree *tree,
3062 struct page *page,
(...)
3127 em = __get_extent_map(inode, page, pg_offset, cur,
3128 end - cur + 1, get_extent, em_cached);
3129 if (IS_ERR_OR_NULL(em)) {
3130 SetPageError(page);
3131 unlock_extent(tree, cur, end);
3132 break;
3133 }
3134 extent_offset = cur - em->start;
3135 BUG_ON(extent_map_end(em) <= cur);
(...)
Consider the following example scenario, where we end up hitting the
BUG_ON() in __do_readpage().
We have an inode with a size of 8KiB and 2 extent maps:
extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
a previous transaction
extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
persisted but writeback started for it already. The extent map
is pinned since there's writeback and an ordered extent in
progress, so it can not be merged with extent map A yet
The following sequence of steps leads to the BUG_ON():
1) The ordered extent for extent B completes, the respective page gets its
writeback bit cleared and the extent map is unpinned, at that point it
is not yet merged with extent map A because it's in the list of modified
extents;
2) Due to memory pressure, or some other reason, the MM subsystem releases
the page corresponding to extent B - btrfs_releasepage() is called and
returns 1, meaning the page can be released as it's not dirty, not under
writeback anymore and the extent range is not locked in the inode's
iotree. However the extent map is not released, either because we are
not in a context that allows memory allocations to block or because the
inode's size is smaller than 16MiB - in this case our inode has a size
of 8KiB;
3) Task B needs to read extent B and ends up __do_readpage() through the
btrfs_readpage() callback. At __do_readpage() it gets a reference to
extent map B;
4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
while holding the write lock on the inode's extent map tree - this
results in try_merge_map() being called and since it's possible to merge
extent map B with extent map A now (the extent map B was removed from
the list of modified extents), the merging begins - it sets extent map
B's start offset to 0 (was 4KiB), but before it increments the map's
length to 8KiB (4kb + 4KiB), task A is at:
BUG_ON(extent_map_end(em) <= cur);
The call to extent_map_end() sees the extent map has a start of 0
and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
the BUG_ON() is triggered.
So it's dangerous to modify an extent map that is in the tree, because some
other task might have got a reference to it before and still using it, and
needs to see a consistent map while using it. Generally this is very rare
since most paths that lookup and use extent maps also have the file range
locked in the inode's iotree. The fsync path is pretty much the only
exception where we don't do it to avoid serialization with concurrent
reads.
Fix this by not allowing an extent map do be merged if if it's being used
by tasks other then the one attempting to merge the extent map (when the
reference count of the extent map is greater than 2).
Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and
kmalloc(), respectively. In the following code, if an error occurs, the
execution will be redirected to 'out' or 'out_unlock' and the function will
be exited. However, on some of the paths, 'ref' and 'ra' are not
deallocated, leading to memory leaks. For example, if 'action' is
BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return
value indicates an error, the execution will be redirected to 'out'. But,
'ref' is not deallocated on this path, causing a memory leak.
To fix the above issues, deallocate both 'ref' and 'ra' before exiting from
the function when an error is encountered.
CC: stable@vger.kernel.org # 4.15+
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To pick up the changes from:
7de3f1423f ("KVM: s390: Add new reset vcpu API")
So far we're ignoring those arch specific ioctls, we need to revisit
this at some time to have arch specific tables, etc:
$ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh
egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
$
This addresses these tools/perf build warnings:
Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
290a6bb06d ("arm64: KVM: Add UAPI notes for swapped registers")
No tools changes are caused by this.
This addresses these tools/perf build warnings:
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from:
85c17291e2 ("x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured")
f444a5ff95 ("x86/cpufeatures: Add support for fast short REP; MOVSB")
These don't cause any changes in tooling, just silences this perf build
warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To silence the following tools/perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Picking up the changes in:
45fc24e89b ("x86/mpx: remove MPX from arch/x86")
that didn't entail any functionality change in the tooling side.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).
Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
(cherry picked from commit b4a9a149f9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.
v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine
Fixes: ad18ba7b5e ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.
Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.
Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).
Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.
However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.
In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).
Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.
v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!
References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.
v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f93)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Picking the changes from:
46b770f720 ("ALSA: uapi: Fix sparse warning")
a103a39899 ("ALSA: control: Fix incompatible protocol error")
bd3eb4e87e ("ALSA: ctl: bump protocol version up to v2.1.0")
ff16351e3f ("ALSA: ctl: remove dimen member from elem_info structure")
5422835666 ("ALSA: ctl: remove unused macro for timestamping of elem_value")
7fd7d6c504 ("ALSA: uapi: Fix typos and header inclusion in asound.h")
1cfaef9617 ("ALSA: bump uapi version numbers")
80fe7430c7 ("ALSA: add new 32-bit layout for snd_pcm_mmap_status/control")
07094ae6f9 ("ALSA: Avoid using timespec for struct snd_timer_tread")
d9e5582c4b ("ALSA: Avoid using timespec for struct snd_rawmidi_status")
3ddee7f88a ("ALSA: Avoid using timespec for struct snd_pcm_status")
a4e7dd35b9 ("ALSA: Avoid using timespec for struct snd_ctl_elem_value")
a07804cc74 ("ALSA: Avoid using timespec for struct snd_timer_status")
Which entails no changes in the tooling side.
To silence this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
d41938d2cb ("mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use")
No changes in tooling, just a rebuild as files needed got touched.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an arm64 version of get_cpuid(), which is used for various annotation
and headers - for example, I now get the CPUID in "perf report --header",
as shown in this snippet:
# hostname : ubuntu
# os release : 5.5.0-rc1-dirty
# perf version : 5.5.rc1.gbf8a13dc9851
# arch : aarch64
# nrcpus online : 96
# nrcpus avail : 96
# cpuid : 0x00000000480fd010
Since much of the code to read the MIDR is already in get_cpuid_str(),
factor out this code.
Tester notes:
I tested this patch on my new ARM64 Kunpeng 920 server.
[root@node1 zsk]# ./perf --version
perf version 5.6.rc1.g2cdb955b7252
Both perf list and perf stat can work.
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1576245255-210926-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the change in:
cc662126b4 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
That don't result in any changes in tooling, just silences this perf
build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
e933adde6f ("fscrypt: include <linux/ioctl.h> in UAPI header")
93edd392ca ("fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY")
That don't trigger any changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
KVM allows the deferral of exception payloads when a vCPU is in guest
mode to allow the L1 hypervisor to intercept certain events (#PF, #DB)
before register state has been modified. However, this behavior is
incompatible with the KVM_{GET,SET}_VCPU_EVENTS ABI, as userspace
expects register state to have been immediately modified. Userspace may
opt-in for the payload deferral behavior with the
KVM_CAP_EXCEPTION_PAYLOAD per-VM capability. As such,
kvm_multiple_exception() will immediately manipulate guest registers if
the capability hasn't been requested.
Since the deferral is only necessary if a userspace ioctl were to be
serviced at the same as a payload bearing exception is recognized, this
behavior can be relaxed. Instead, opportunistically defer the payload
from kvm_multiple_exception() and deliver the payload before completing
a KVM_GET_VCPU_EVENTS ioctl.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will
be populated if a VM-exit caused by an INIT signal takes priority over a
debug-trap. Emulate this behavior when synthesizing an INIT signal
VM-exit into L1.
Fixes: 4b9852f4f3 ("KVM: x86: Fix INIT signal handling in various CPU states")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM defines the #DB payload as compatible with the 'pending debug
exceptions' field under VMX, not DR6. Mask off bit 12 when applying the
payload to DR6, as it is reserved on DR6 but not the 'pending debug
exceptions' field.
Fixes: f10c729ff9 ("kvm: vmx: Defer setting of DR6 until #DB delivery")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Only the first and the last nodes were being added to
ref->preallocated_barriers.
Renaming variables to make it more easy to read.
Fixes: 8413502238 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b822)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As we use a mutex to serialise the first acquire (as it may be a lengthy
operation), but only an atomic decrement for the release, we have to
be careful in case a second thread races and completes both
acquire/release as the first finishes its acquire.
Thread A Thread B
i915_active_acquire i915_active_acquire
atomic_read() == 0 atomic_read() == 0
mutex_lock() mutex_lock()
atomic_read() == 0
ref->active();
atomic_inc()
mutex_unlock()
atomic_read() == 1
i915_active_release
atomic_dec_and_test() -> 0
ref->retire()
atomic_inc() -> 1
mutex_unlock()
So thread A has acquired the ref->active_count but since the ref was
still active at the time, it did not initialise it. By switching the
check inside the mutex to an atomic increment only if already active, we
close the race.
Fixes: c9ad602fea ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
(cherry picked from commit ac0e331a62)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0c4 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e36505d0c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Accessing a per-cpu variable only makes sense when preemption is
disabled (and the kernel does check this when the right debug options
are switched on).
For kvm_get_running_vcpu(), it is fine to return the value after
re-enabling preemption, as the preempt notifiers will make sure that
this is kept consistent across task migration (the comment above the
function hints at it, but lacks the crucial preemption management).
While we're at it, move the comment from the ARM code, which explains
why the whole thing works.
Fixes: 7495e22bb1 ("KVM: Move running VCPU from ARM to common code").
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/318984f6-bc36-33a3-abc6-bf2295974b06@huawei.com
Message-id: <20200207163410.31276-1-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Apply vast quantities of poison and not tell anyone to see if we fall
for the trap of using a stale RING_HEAD.
References: 42827350f7 ("drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-2-chris@chris-wilson.co.uk
On retiring the request, we should not re-use these elements in the ring
(at least not until we fill the ringbuffer and knowingly reuse the space).
Leave behind some poison to (hopefully) trap ourselves if we make a
mistake.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-1-chris@chris-wilson.co.uk
Do not initialize the microcode version at RESET or INIT, only on vCPU
creation. Microcode updates are not lost during INIT, and exact
behavior across a warm RESET is not specified by the architecture.
Since we do not support a microcode update directly from the hypervisor,
but only as a result of userspace setting the microcode version MSR,
it's simpler for userspace if we do nothing in KVM and let userspace
emulate behavior for RESET as it sees fit.
Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in
the list of emulated MSRs.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sameeh Jubran says:
====================
Bug fixes for ENA Ethernet driver
Difference from V1:
* Started using netdev_rss_key_fill()
* Dropped superflous changes that are not related to bug fixes as
requested by Jakub
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
comp_ctx can be NULL in a very rare case when an admin command is executed
during the execution of ena_remove().
The bug scenario is as follows:
* ena_destroy_device() sets the comp_ctx to be NULL
* An admin command is executed before executing unregister_netdev(),
this can still happen because our device can still receive callbacks
from the netdev infrastructure such as ethtool commands.
* When attempting to access the comp_ctx, the bug occurs since it's set
to NULL
Fix:
Added a check that comp_ctx is not NULL
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool,
thus the xor enum was used for supporting crc32.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing
the key or indirection table using ethtool while keeping the same hash
function.
Also add a function for retrieving the current hash function from
the ena-com layer.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Saeed Bshara <saeedb@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ena_com_ind_tbl_convert_from_device() has an overflow
bug as explained below. Either way, this function is not needed at
all since we don't retrieve the indirection table from the device
at any point which means that this conversion is not needed.
The bug:
The for loop iterates over all io_sq_queues, when passing the actual
number of used queues the io_sq_queues[i].idx equals 0 since they are
uninitialized which results in the following code to be executed till
the end of the loop:
dev_idx_to_host_tbl[0] = i;
This results dev_idx_to_host_tbl[0] in being equal to
ENA_TOTAL_NUM_QUEUES - 1.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The indirection table has the indices of the Rx queues. When we store it
during set indirection operation, we convert the indices to our internal
representation of the indices.
Our internal representation of the indices is: even indices for Tx and
uneven indices for Rx, where every Tx/Rx pair are in a consecutive order
starting from 0. For example if the driver has 3 queues (3 for Tx and 3
for Rx) then the indices are as follows:
0 1 2 3 4 5
Tx Rx Tx Rx Tx Rx
The BUG:
The issue is that when we satisfy a get request for the indirection
table, we don't convert the indices back to the original representation.
The FIX:
Simply apply the inverse function for the indices of the indirection
table after we set it.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device receives, stores and retrieves the hash function value as bits
and not as their enum value.
The bug:
* In ena_com_set_hash_function() we set
cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func.
(1 << rss->hash_func)
* In ena_com_get_hash_function() we retrieve the hash function and store
it's bit value in rss->hash_func. (Now the bit value of rss->hash_func
is stored in rss->hash_func instead of it's enum value)
The fix:
This commit fixes the issue by converting the retrieved hash function
values from the device to the matching enum value of the set bit using
ffs(). ffs() finds the first set bit's index in a word. Since the function
returns 1 for the LSB's index, we need to subtract 1 from the returned
value (note that BIT(0) is 1).
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On old hardware, getting / setting the hash function is not supported while
gettting / setting the indirection table is.
This commit enables us to still show the indirection table on older
hardwares by setting the hash function and key to NULL.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we allocate the key whether the device supports setting the
key or not. This commit adds a check to the allocation function and
handles the error accordingly.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug description:
When running "ethtool -x <if_name>" the key shows up as all zeros.
When we use "ethtool -X <if_name> hfunc toeplitz hkey <some:random:key>" to
set the key and then try to retrieve it using "ethtool -x <if_name>" then
we return the correct key because we return the one we saved.
Bug cause:
We don't fetch the key from the device but instead return the key
that we have saved internally which is by default set to zero upon
allocation.
Fix:
This commit fixes the issue by initializing the key to a random value
using netdev_rss_key_fill().
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation of the driver calls skb_tx_timestamp()to add a
software tx timestamp to the skb, however the software-transmit capability
is not reported in ethtool -T.
This commit updates the ethtool structure to report the software-transmit
capability in ethtool -T using the standard ethtool_op_get_ts_info().
This function reports all software timestamping capabilities (tx and rx),
as well as setting phc_index = -1. phc_index is the index of the PTP
hardware clock device that will be used for hardware timestamps. Since we
don't have such a device in ENA, using the default -1 value is the correct
setting.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
>From the documentation of round_jiffies():
"Rounds a time delta in the future (in jiffies) up or down to
(approximately) full seconds. This is useful for timers for which
the exact time they fire does not matter too much, as long as
they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire
at the same time, rather than at various times spread out. The goal
of this is to have the CPU wake up less, which saves power."
There are 2 parts to this patch:
================================
Part 1:
-------
In our case we need timer_service to be called approximately every
X=1 seconds, and the exact time does not matter, so using round_jiffies()
is the right way to go.
Therefore we add round_jiffies() to the mod_timer() in ena_timer_service().
Part 2:
-------
round_jiffies() is used in check_for_missing_keep_alive() when
getting the jiffies of the expiration of the keep_alive timeout. Here it
is actually a mistake to use round_jiffies() because we want the exact
time when keep_alive should expire and not an approximate rounded time,
which can cause early, false positive, timeouts.
Therefore we remove round_jiffies() in the calculation of
keep_alive_expired() in check_for_missing_keep_alive().
Fixes: 82ef30f13b ("net: ena: add hardware hints capability to the driver")
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ethtool -X is called without an hkey, ena_com_fill_hash_function()
is called with key=NULL, which is passed to memcpy causing a crash.
This commit fixes this issue by checking key is not NULL.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit d9d6a9aed3 ("i40e: Fix virtchnl_queue_select bitmap
validation") introduced a necessary change for verifying how queue
bitmaps from the iavf driver get validated. Unfortunately, the
conditional was reversed. Fix this.
Fixes: d9d6a9aed3 ("i40e: Fix virtchnl_queue_select bitmap validation")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>