Commit Graph

932711 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab
9955d906f2 media: atomisp: remove kvmalloc/kvcalloc abstractions
The sh_css layer adds an abstraction for kvmalloc/kvcalloc.

Get rid of them. Most of the work here was done by this
small coccinelle script:

<cocci>
@@
expression size;
@@

- sh_css_malloc(size)
+ kvmalloc(size, GFP_KERNEL)

@@
expression n;
expression size;
@@

- sh_css_calloc(n, size)
+ kvcalloc(n, size, GFP_KERNEL)
</cocci>

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:15:10 +02:00
Mauro Carvalho Chehab
591e6a0aad media: atomisp: add more comments about frame allocation
The frame allocation logic happens differently for userptr
or normal mmap. On a quick look, this sounded to be unbalanced,
but the logic should actually work for both cases.

Add an extra comment to reflect it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:14:40 +02:00
Mauro Carvalho Chehab
27b778c5ef media: atomisp: add debug functions for received events
For debugging purposes, it helps to know what event
was actually received.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:13:53 +02:00
Mauro Carvalho Chehab
d61ba1a2e2 media: atomisp: improve warning for IRQ enable function
If something gets wrong when enabling or disabling an IRQ,
we should know better about what happened.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:13:30 +02:00
Mauro Carvalho Chehab
03884c9356 media: atomisp: add debug for hmm alloc
The hmm code is still complex and has bugs. Add a debug print
when memory gets allocated, in order to help identifying what's
happening out there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:13:03 +02:00
Mauro Carvalho Chehab
14a638ab96 media: atomisp: use pin_user_pages() for memory allocation
Instead of using a hacked version of an old copy of
get_user_pages(), use pin_user_pages().

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:12:33 +02:00
Mauro Carvalho Chehab
19ae08554f media: atomisp: fix driver caps
This device driver is not MC-centric. So, remove the wrong
caps from it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:12:11 +02:00
Mauro Carvalho Chehab
9ac8e4b90b media: atomisp: use Yocto Aero default hmm pool sizes
Yocto Aero driver has a different default for hmm pools.

Use the definitions there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:10:43 +02:00
Mauro Carvalho Chehab
e19718f6de media: atomisp: add debug message to help debugging hmm code
The hmm code is partially based on a fork from 3.10 code,
and has bugs.

Add debug there to help tracking what happens there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:10:03 +02:00
Mauro Carvalho Chehab
576680cd01 media: atomisp: hmm_bo: untag user pointers
The kernel ABI was extended to allow pass tagged user pointers.

Untag the pointers in this function.

Fixes: d93445225c ("uaccess: add noop untagged_addr definition")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:09:44 +02:00
Mauro Carvalho Chehab
08fef4fa94 media: atomisp: get rid of memory_access.c
Now that we have everything in place, we can get rid of the
memory_access abstraction layer.

Now, everything related to heterogeneous memory management
(hmm) is under hmm.c & related pools.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:08:30 +02:00
Mauro Carvalho Chehab
100e89894b media: atomisp: change the type returned by mmgr alloc
The mmgr alloc code returns a different type than hmm, due to
some abstraction layer.

Change the driver to use just one type to represent the
hmm memory.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:07:59 +02:00
Mauro Carvalho Chehab
dc50fa18af media: atomisp: get rid of unused memory_realloc code
The code for it is commented out, probably because it is
broken or uneeded for the driver to work. So, let's get
rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:05:20 +02:00
Mauro Carvalho Chehab
5472b4db3f media: atomisp: get rid of mmgr_load and mmgr_store
Those functions are just wrappers for hmm_load/hmm_store.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:04:56 +02:00
Mauro Carvalho Chehab
b92d99aec5 media: atomisp: go one step further to drop ia_css_memory_access.c
Move the attrs handling into hmm, simplifying even further
what the ia_css_memory_access.c file does.

Yet, the returned type for ia_css_memory_access.c is an
integer, instead of a pointer.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:04:33 +02:00
Mauro Carvalho Chehab
86df6ff242 media: atomisp: reduce abstraction at ia_css_memory_access
Yet another memory abstraction layer. Getting rid of this
may be a little trickier, but let's reduce it to a minimal.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:04:06 +02:00
Mauro Carvalho Chehab
4fba2916f6 media: atomisp: get rid of the hrt/hive_isp_css_mm_hrt abstraction layer
Simplify the code by removing this extra memory management
abstraction layer.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:03:43 +02:00
Mauro Carvalho Chehab
2323994338 media: atomisp: simplify hive_isp_css_mm_hrt wrapper
The code there is a wrapper for hmm/ wrapper. Simplify it,
and get rid of ION-specific code.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:03:15 +02:00
Colin Ian King
02ab76491b media: atomisp: fix a handful of spelling mistakes
There are several spelling mistakes in various messages and literal
strings. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:02:50 +02:00
Mauro Carvalho Chehab
d573933c80 media: atomisp: fix size of delay_frames array
Right now, the variables that define the max number of
delay frames is defined as:

	#define VIDEO_FRAME_DELAY		2
	#define MAX_NUM_VIDEO_DELAY_FRAMES	(VIDEO_FRAME_DELAY + 1)
	#define NUM_PREVIEW_DVS_FRAMES          (2)
	#define MAX_NUM_DELAY_FRAMES   MAX(MAX_NUM_VIDEO_DELAY_FRAMES, NUM_PREVIEW_DVS_FRAMES)

In other words, we have:
	MAX_NUM_VIDEO_DELAY_FRAMES = 3
	MAX_NUM_DELAY_FRAMES = 2

The MAX_NUM_DELAY_FRAMES macro is used only only when allocating
memory. On all other parts, including looping over such array,
MAX_NUM_VIDEO_DELAY_FRAMES is used instead, like:

	void sh_css_binary_args_reset(struct sh_css_binary_args *args)
	{
		unsigned int i;
	...

		for (i = 0; i < MAX_NUM_VIDEO_DELAY_FRAMES; i++)
			args->delay_frames[i] = NULL;

Which will cause buffer overflows, with may override the next array
(tnr_frames[]).

In practice, this may not be causing real issues, as the code
checks for num_delay_frames on some parts (but not everywhere).

So, get rid of the smallest value.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:02:30 +02:00
Mauro Carvalho Chehab
f611797795 media: atomisp: drop a cast for a const argument
Some arguments for tnf and ref settings are meant to be const, but
they're defined without such annotation. Due to that, there's an
ugly cast at sh_css_sp.c.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:02:03 +02:00
Mauro Carvalho Chehab
c01d554677 media: atomisp: partially get rid of one abstraction layer
The very same macros are defined as CSS_foo and IA_CSS_foo.

Remove this abstraction, as it just make things confusing,
for no good reason.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:00:44 +02:00
Mauro Carvalho Chehab
bdfdd9e7df media: atomisp: make it use dbg_level to control debug level
This driver has 3 different types of debug messages:

	- dev_dbg()
	- dbg_level
	- ia_css_debug_trace_level

Which is crazy. Ideally, it shold just use dev_dbg()
everywhere, but for now let's unify the last two machanisms.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 18:00:17 +02:00
Mauro Carvalho Chehab
6b3a422b73 media: atomisp: get rid of some old broken debug code
It sounds that someone once changed the debug level at compile
time for some testing, but forgot to remove the legacy code after
finishing debuging it.

Get rid of the dead code.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:59:56 +02:00
Mauro Carvalho Chehab
2a693c3e18 media: atomisp: update TODO list
Let's reflect the current status at the TODO list, as other
developers can help addressing issues over there.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:59:40 +02:00
Mauro Carvalho Chehab
9c30f50d14 media: atomisp: don't flood dmesg with -EAGAIN return codes
Using DQBUF on non-blocking mode will return -EAGAIN
if nothing arrives. Printing it has no value, even for debug
purposes. So, only display real return codes.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:59:19 +02:00
Mauro Carvalho Chehab
171b7bd66a media: atomisp: improve debug messages for set format
There are several error conditions that don't print anything,
making harder to identify bugs at the code there.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:59:01 +02:00
Mauro Carvalho Chehab
370f6e5aaa media: atomisp: avoid a copy of v4l2_mbus_framefmt at stack
There's no reason to copy isp_sink_fmt, as the driver
uses it for read-only purposes.

Linux stack is a precious resource. Let's avoid wasting it.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:58:45 +02:00
Mauro Carvalho Chehab
aaebb65476 media: atomisp: reduce debug printk rate when IRQs are received
Currently, when an EOF IRQ is received, it generates two messages:

	[   59.191893] atomisp-isp2 0000:00:03.0: irq:0x200000
	[   59.191913] atomisp-isp2 0000:00:03.0: atomisp_isr EOF exp_id 142, asd 0

Flooding the dmesg with lots of messages per second. The same
pattern happens for all other IRQs.

Change the logic for printing just one message per IRQ and
rate-limit those, as, for debugging purposes, it is usually
interesting to know that IRQs are being received, but not
displaying every single one.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:58:26 +02:00
Mauro Carvalho Chehab
8ef6b8a67b media: atomisp: get rid of hmm_vm.c
This is not used anywhere. So, let's trash it.

Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:58:04 +02:00
Mauro Carvalho Chehab
7b53e162f2 media: atomisp: fix pipeline initialization code
The code under load_primary_binaries() is complex and
were hard to understand, because it used to have lots
of ifdefs and broken identation.

The patch which cleaned it and removed the version-specific
ifdefs added a regression.

Solve it.

Fixes: 3c0538fbad ("media: atomisp: get rid of most checks for ISP2401 version")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-06-11 17:57:48 +02:00
Xiaoguang Wang
65a6543da3 io_uring: fix io_kiocb.flags modification race in IOPOLL mode
While testing io_uring in arm, we found sometimes io_sq_thread() keeps
polling io requests even though there are not inflight io requests in
block layer. After some investigations, found a possible race about
io_kiocb.flags, see below race codes:
  1) in the end of io_write() or io_read()
    req->flags &= ~REQ_F_NEED_CLEANUP;
    kfree(iovec);
    return ret;

  2) in io_complete_rw_iopoll()
    if (res != -EAGAIN)
        req->flags |= REQ_F_IOPOLL_COMPLETED;

In IOPOLL mode, io requests still maybe completed by interrupt, then
above codes are not safe, concurrent modifications to req->flags, which
is not protected by lock or is not atomic modifications. I also had
disassemble io_complete_rw_iopoll() in arm:
   req->flags |= REQ_F_IOPOLL_COMPLETED;
   0xffff000008387b18 <+76>:    ldr     w0, [x19,#104]
   0xffff000008387b1c <+80>:    orr     w0, w0, #0x1000
   0xffff000008387b20 <+84>:    str     w0, [x19,#104]

Seems that the "req->flags |= REQ_F_IOPOLL_COMPLETED;" is  load and
modification, two instructions, which obviously is not atomic.

To fix this issue, add a new iopoll_completed in io_kiocb to indicate
whether io request is completed.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:45:21 -06:00
Sean Christopherson
2ebac8bb3c KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON,
when determining whether a nested VM-Exit should be reflected into L1 or
handled by KVM in L0.

For better or worse, the switch statement in nested_vmx_exit_reflected()
currently defaults to "true", i.e. reflects any nested VM-Exit without
dedicated logic.  Because the case statements only contain the basic
exit reason, any VM-Exit with modifier bits set will be reflected to L1,
even if KVM intended to handle it in L0.

Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY,
i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to
L1, as "failed VM-Entry" is the only modifier that KVM can currently
encounter.  The SMM modifiers will never be generated as KVM doesn't
support/employ a SMI Transfer Monitor.  Ditto for "exit from enclave",
as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to
enter an enclave in a KVM guest (L1 or L2).

Fixes: 644d711aa0 ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit")
Cc: Jim Mattson <jmattson@google.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11 11:28:11 -04:00
Colin Ian King
9a6a5738ab umem: remove redundant initialization of variable ret
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Addresses-Coverity: ("Unused value")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:16:17 -06:00
Colin Ian King
8d20319e29 pktcdvd: remove redundant initialization of variable ret
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Addresses-Coverity: ("Unused value")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:15:48 -06:00
Chaitanya Kulkarni
819f7b88b4 nvmet: fail outstanding host posted AEN req
In function nvmet_async_event_process() we only process AENs iff
there is an open slot on the ctrl->async_event_cmds[] && aen
event list posted by the target is not empty. This keeps host
posted AEN outstanding if target generated AEN list is empty.
We do cleanup the target generated entries from the aen list in
nvmet_ctrl_free()-> nvmet_async_events_free() but we don't
process AEN posted by the host. This leads to following problem :-

When processing admin sq at the time of nvmet_sq_destroy() holds
an extra percpu reference(atomic value = 1), so in the following code
path after switching to atomic rcu, release function (nvmet_sq_free())
is not getting called which blocks the sq->free_done in
nvmet_sq_destroy() :-

nvmet_sq_destroy()
 percpu_ref_kill_and_confirm()
 - __percpu_ref_switch_mode()
 --  __percpu_ref_switch_to_atomic()
 ---   call_rcu() -> percpu_ref_switch_to_atomic_rcu()
 ----     /* calls switch callback */
 - percpu_ref_put()
 -- percpu_ref_put_many(ref, 1)
 --- else if (unlikely(atomic_long_sub_and_test(nr, &ref->count)))
 ----   ref->release(ref); <---- Not called.

This results in indefinite hang:-

  void nvmet_sq_destroy(struct nvmet_sq *sq)
...
          if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
                  nvmet_async_events_process(ctrl, status);
                  percpu_ref_put(&sq->ref);
          }
          percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
          wait_for_completion(&sq->confirm_done);
          wait_for_completion(&sq->free_done); <-- Hang here

Which breaks the further disconnect sequence. This problem seems to be
introduced after commit 64f5e9cdd7 ("nvmet: fix memory leak when
removing namespaces and controllers concurrently").

This patch processes ctrl->async_event_cmds[] in the admin sq destroy()
context irrespetive of aen_list. Also we get rid of the controller's
aen_list processing in the nvmet_sq_destroy() context and just ignore
ctrl->aen_list.

This results in nvmet_async_events_process() being called from workqueue
context so we adjust the code accordingly.

Fixes: 64f5e9cdd7 ("nvmet: fix memory leak when removing namespaces and controllers concurrently ")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:06 -06:00
Christoph Hellwig
b97120b15e nvme-pci: use simple suspend when a HMB is enabled
While the NVMe specification allows the device to access the host memory
buffer in host DRAM from all power states, hosts will fail access to
DRAM during S3 and similar power states.

Fixes: d916b1be94 ("nvme-pci: use host managed power state for suspend")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:06 -06:00
Daniel Wagner
c9c12e51b8 nvme-fc: don't call nvme_cleanup_cmd() for AENs
Asynchronous event notifications do not have an associated request.
When fcp_io() fails we unconditionally call nvme_cleanup_cmd() which
leads to a crash.

Fixes: 16686f3a6c ("nvme: move common call to nvme_cleanup_cmd to core layer")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:05 -06:00
Max Gurtovoy
a40aae6bbf nvmet-tcp: constify nvmet_tcp_ops
nvmet_tcp_ops is never modified and can be made const to allow the
compiler to put it in read-only memory, as done in other transports.

Before:
   text    data     bss     dec     hex filename
  16164     160      12   16336    3fd0 drivers/nvme/target/tcp.o

After:
   text    data     bss     dec     hex filename
  16277      64      12   16353    3fe1 drivers/nvme/target/tcp.o

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:05 -06:00
Rikard Falkeborn
6acbd9619b nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops
nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops are never modified and can be
made const to allow the compiler to put them in read-only memory.

Before:
   text    data     bss     dec     hex filename
  53102    6885     576   60563    ec93 drivers/nvme/host/tcp.o

After:
   text    data     bss     dec     hex filename
  53422    6565     576   60563    ec93 drivers/nvme/host/tcp.o

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:05 -06:00
Niklas Cassel
108a58585b nvme: do not call del_gendisk() on a disk that was never added
device_add_disk() is negated by del_gendisk().
alloc_disk_node() is negated by put_disk().

In nvme_alloc_ns(), device_add_disk() is one of the last things being
called in the success case, and only void functions are being called
after this. Therefore this call should not be negated in the error path.

The superfluous call to del_gendisk() leads to the following prints:
[    7.839975] kobject: '(null)' (000000001ff73734): is not initialized, yet kobject_put() is being called.
[    7.840865] WARNING: CPU: 2 PID: 361 at lib/kobject.c:736 kobject_put+0x70/0x120

Fixes: 33cfdc2aa6 ("nvme: enforce extended LBA format for fabrics metadata")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11 09:10:05 -06:00
Tobias Klauser
d4060ac969 tools, bpftool: Fix memory leak in codegen error cases
Free the memory allocated for the template on error paths in function
codegen.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200610130804.21423-1-tklauser@distanz.ch
2020-06-11 16:08:48 +02:00
YiFei Zhu
bd6fecb9a9 selftests/bpf: Add cgroup_skb/egress test for load_bytes_relative
When cgroup_skb/egress triggers the MAC header is not set. Added a
test that asserts reading MAC header is a -EFAULT but NET header
succeeds. The test result from within the eBPF program is stored in
an 1-element array map that the userspace then reads and asserts on.

Another assertion is added that reading from a large offset, past
the end of packet, returns -EFAULT.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/9028ccbea4385a620e69c0a104f469ffd655c01e.1591812755.git.zhuyifei@google.com
2020-06-11 16:05:56 +02:00
YiFei Zhu
0f5d82f187 net/filter: Permit reading NET in load_bytes_relative when MAC not set
Added a check in the switch case on start_header that checks for
the existence of the header, and in the case that MAC is not set
and the caller requests for MAC, -EFAULT. If the caller requests
for NET then MAC's existence is completely ignored.

There is no function to check NET header's existence and as far
as cgroup_skb/egress is concerned it should always be set.

Removed for ptr >= the start of header, considering offset is
bounded unsigned and should always be true. len <= end - mac is
redundant to ptr + len <= end.

Fixes: 3eee1f75f2 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
2020-06-11 16:05:56 +02:00
Tony Luck
7ccddc4613 x86/mce/dev-mcelog: Fix -Wstringop-truncation warning about strncpy()
The kbuild test robot reported this warning:

  arch/x86/kernel/cpu/mce/dev-mcelog.c: In function 'dev_mcelog_init_device':
  arch/x86/kernel/cpu/mce/dev-mcelog.c:346:2: warning: 'strncpy' output \
    truncated before terminating nul copying 12 bytes from a string of the \
    same length [-Wstringop-truncation]

This is accurate, but I don't care that the trailing NUL character isn't
copied. The string being copied is just a magic number signature so that
crash dump tools can be sure they are decoding the right blob of memory.

Use memcpy() instead of strncpy().

Fixes: d8ecca4043 ("x86/mce/dev-mcelog: Dynamically allocate space for machine check records")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200527182808.27737-1-tony.luck@intel.com
2020-06-11 15:19:17 +02:00
Tony Luck
17fae1294a x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned
An interesting thing happened when a guest Linux instance took a machine
check. The VMM unmapped the bad page from guest physical space and
passed the machine check to the guest.

Linux took all the normal actions to offline the page from the process
that was using it. But then guest Linux crashed because it said there
was a second machine check inside the kernel with this stack trace:

do_memory_failure
    set_mce_nospec
         set_memory_uc
              _set_memory_uc
                   change_page_attr_set_clr
                        cpa_flush
                             clflush_cache_range_opt

This was odd, because a CLFLUSH instruction shouldn't raise a machine
check (it isn't consuming the data). Further investigation showed that
the VMM had passed in another machine check because is appeared that the
guest was accessing the bad page.

Fix is to check the scope of the poison by checking the MCi_MISC register.
If the entire page is affected, then unmap the page. If only part of the
page is affected, then mark the page as uncacheable.

This assumes that VMMs will do the logical thing and pass in the "whole
page scope" via the MCi_MISC register (since they unmapped the entire
page).

  [ bp: Adjust to x86/entry changes. ]

Fixes: 284ce4011b ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reported-by: Jue Wang <juew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jue Wang <juew@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200520163546.GA7977@agluck-desk2.amr.corp.intel.com
2020-06-11 15:19:17 +02:00
Thomas Gleixner
f77d26a9fc Merge branch 'x86/entry' into ras/core
to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow
up patches can be applied without creating a horrible merge conflict
afterwards.
2020-06-11 15:17:57 +02:00
Thomas Gleixner
f0178fc01f x86/entry: Unbreak __irqentry_text_start/end magic
The entry rework moved interrupt entry code from the irqentry to the
noinstr section which made the irqentry section empty.

This breaks boundary checks which rely on the __irqentry_text_start/end
markers to find out whether a function in a stack trace is
interrupt/exception entry code. This affects the function graph tracer and
filter_irq_stacks().

As the IDT entry points are all sequentialy emitted this is rather simple
to unbreak by injecting __irqentry_text_start/end as global labels.

To make this work correctly:

  - Remove the IRQENTRY_TEXT section from the x86 linker script
  - Define __irqentry so it breaks the build if it's used
  - Adjust the entry mirroring in PTI
  - Remove the redundant kprobes and unwinder bound checks

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 15:15:29 +02:00
Peter Zijlstra
2823e83a3d x86/entry: __always_inline CR2 for noinstr
vmlinux.o: warning: objtool: exc_page_fault()+0x9: call to read_cr2() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x24: call to prefetchw() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x21: call to kvm_handle_async_pf.isra.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_nmi()+0x1cc: call to write_cr2() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.243227806@infradead.org
2020-06-11 15:15:28 +02:00
Peter Zijlstra
6eebad1ad3 lockdep: __always_inline more for noinstr
vmlinux.o: warning: objtool: debug_locks_off()+0xd: call to __debug_locks_off() leaves .noinstr.text section
vmlinux.o: warning: objtool: match_held_lock()+0x6a: call to look_up_lock_class.isra.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x90: call to lockdep_recursion_finish() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.185201076@infradead.org
2020-06-11 15:15:28 +02:00