Commit Graph

3871 Commits

Author SHA1 Message Date
Oded Gabbay
6dc66f7c26 habanalabs: correctly cast variable to __le32
When using the macro le32_to_cpu(x), we need to correctly convert x to be
__le32 in case it is defined as u32 variable.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2019-09-05 14:55:28 +03:00
Oded Gabbay
307eae93d5 habanalabs: show correct id in error print
If the initialization of a device failed, the driver prints an error
message with the id of the device. The device index on the file system is
that id divided by 2.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:28 +03:00
Oded Gabbay
4c172bbfaa habanalabs: stop using the acronym KMD
We want to stop using the acronym KMD. Therefore, replace all locations
(except for register names we can't modify) where KMD is written to other
terms such as "Linux kernel driver" or "Host kernel driver", etc.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay
0996bd1c74 habanalabs: display card name as sensors header
To allow the user to use a custom file for the HWMON lm-sensors library
per card type, the driver needs to register the HWMON sensors with the
specific card type name.

The card name is supplied by the F/W running on the device. If the F/W is
old and doesn't supply a card name, a default card name is displayed as
the sensors group name.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay
e9730763a2 habanalabs: add uapi to retrieve aggregate H/W events
Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the
events counters are NOT cleared upon device reset, but count from the
loading of the driver.

Add the code to support it in the device event handling function.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay
75b3cb2bb0 habanalabs: add uapi to retrieve device utilization
Users and sysadmins usually want to know what is the device utilization as
a level 0 indication if they are efficiently using the device.

Add a new opcode to the INFO IOCTL that will return the device utilization
over the last period of 100-1000ms. The return value is 0-100,
representing as percentage the total utilization rate.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Tomer Tayar
413cf576fd habanalabs: Make the Coresight timestamp perpetual
The Coresight timestamp is enabled for a specific debug session using
the HL_DEBUG_OP_TIMESTAMP opcode of the debug IOCTL.
In order to have a perpetual timestamp that would be comparable between
various debug sessions, this patch moves the timestamp enablement to be
part of the HW initialization.
The HL_DEBUG_OP_TIMESTAMP opcode turns to be deprecated and shouldn't be
used. Old user-space that will call it won't see any change in the
behavior of the debug session.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Oded Gabbay
867b58ac94 habanalabs: print to kernel log when reset is finished
Now that we don't print the queue testing messages, we need to print when
the reset is finished so whoever looks at the kernel log will know the
reset process was finished successfully and the driver is not stuck.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:27 +03:00
Oded Gabbay
fe9a52c97f habanalabs: replace __le32_to_cpu with le32_to_cpu
In some files the driver uses __le32_to_cpu while in other it uses
le32_to_cpu. Replace all __le32_to_cpu instances with le32_to_cpu for
consistency.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Oded Gabbay
abca3a8224 habanalabs: replace __cpu_to_le32/64 with cpu_to_le32/64
In some files the code use __cpu_to_le32/64 while in other it use
cpu_to_le32/64. Replace all __cpu_to_le32/64 instances with
cpu_to_le32/64 for consistency.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Tomer Tayar
129b6a9324 habanalabs: Handle HW_IP_INFO if device disabled or in reset
The HW IP information is relevant even if the device is disabled or in
reset, so always handle the corresponding INFO IOCTL opcode.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Tomer Tayar
ea451f88ef habanalabs: Expose devices after initialization is done
The char devices are currently exposed to user before the device and
driver initialization are done.
This patch moves the cdev and device adding to the system to the end of
the initialization sequence, while keeping the creation of the
structures at the beginning to allow the usage of dev_*().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Omer Shpigelman
9b50f539ff habanalabs: improve security in Debug IOCTL
This patch improves the security in the Debug IOCTL.
It adds checks that:
- The register index value is in the allowed range for all opcodes.
- The event types number is in the allowed range in SPMU enable.
- The events number is in the allowed range in SPMU disable.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Omer Shpigelman
8d1759329d habanalabs: use default structure for user input in Debug IOCTL
This patch fixes a possible kernel crash when a user provides a too small
input structure to the Debug IOCTL.
The fix sets a default input structure and copies to it the user data.
In case the user provided as input a too small structure, the code will
use the default values taken from the default structure.
Note that in contrary to the input structure, the user can provide an
output structure with changing size or no size at all. Therefore the user
output structure validation is already done in the Debug logic later on.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:27 +03:00
Tomer Tayar
10d7de2cdb habanalabs: Add descriptive name to PSOC app status register
Add a meaningful name to the general PSOC application status register
which better describes its usage in keeping the HW state.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:26 +03:00
Tomer Tayar
4095a17657 habanalabs: Add descriptive names to PSOC scratch-pad registers
The PSOC scratch-pad registers are used for communication with the
device CPU. This patch adds new definitions for these registers which
are more descriptive than their general names.

The new set of definitions also gathers and documents the current usage
of the scratch-pad registers by the driver and the device CPU.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:26 +03:00
Oded Gabbay
4d6a7751f6 habanalabs: create two char devices per ASIC
This patch changes the driver to create two char devices for each ASIC
it discovers. This is done to allow system/monitoring applications to
query the device for stats, information, idle state and more, while also
allowing the deep-learning application to send work to the ASIC.

One char device is the original device, hlX. IOCTL calls through this
device file can perform any task on the device (compute, memory, queries).
The open function for this device will fail if it was called before but
the file-descriptor it created was not completely released yet (the
release callback function is not called from the kernel until all
instances of that FD are closed). The driver needs to keep this behavior
to support backward compatibility with existing userspace, which count
that the open will fail if the device is "occupied".

The second char device is called "hl_controlDx", where x is the same index
of the main device with a minor number of the original char device + 1.
Applications that open this device can only call the INFO IOCTL. There is
no limitation on the number of applications opening this device.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
b968eb1a84 habanalabs: change device_setup_cdev() to be more generic
This patch re-factors the device_setup_cdev() function to make it more
generic. It doesn't manipulate members of the driver's internal device
structure but instead works only on the arguments that are sent to it.

This is in preparation for using this function to create an additional
char device per ASIC.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
eb7caf84b0 habanalabs: maintain a list of file private data objects
This patch adds a new list to the driver's device structure. The list will
keep the file private data structures that the driver creates when a user
process opens the device.

This change is needed because it is useless to try to count how many FD
are open. Instead, track our own private data structure per open file and
once it is released, remove it from the list. As long as the list is not
empty, it means we have a user that can do something with our device.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
86d5307a6d habanalabs: rename user_ctx as compute_ctx
This patch renames the "user_ctx" field in the device structure to
"compute_ctx". This better reflects the meaning of this context.

In addition, we also check in the ctx_fini() that the debug mode should be
disabled only if the context being destroyed is the compute context. This
has no effect right now as we only have a single process and a single
context, but this makes the code more ready for multiple process support.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
02e921e42b habanalabs: show the process context dram usage
When the user query the dram usage of a context, show it the dram usage of
its context, not the user context that is currently running on the device.

This has no effect right now as we only have a single process and a single
context, but this makes the code more ready for multiple process support.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
4aecb05e52 habanalabs: kill user process after CS rollback
This patch calls the kill user process function after we rollback the
in-flight CSs. This is because the user process can't be closed while
there are open CSs. Therefore, there is no point of sending it a SIGKILL
before we do the rollback CS part.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Oded Gabbay
b888751a02 habanalabs: add handle field to context structure
This patch adds a field to the context's structure that will hold a unique
handle for the context.

This will be needed when the user will create the context.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 14:55:26 +03:00
Chuhong Yuan
30f273222c habanalabs: Use dev_get_drvdata
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:26 +03:00
Oded Gabbay
209257feab habanalabs: power management through sysfs is only for GOYA
The ability of setting power management properties by the system
administrator (through sysfs properties) is only relevant for the GOYA
ASIC. Therefore, move the relevant sysfs properties to the GOYA sysfs
specific file, to make the properties appear in sysfs only for GOYA cards.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:26 +03:00
Oded Gabbay
ed0fc50535 habanalabs: cap simulator timeout
In the driver timeout functions, we give the simulator a factor of 10
in the timeout. This was necessary when the requested timeout is small
but if it was a few seconds, this can result in a very large timeout which
is unnecessary.

This patch caps the maximum timeout of the simulator to 10 seconds, which
is our largest timeout in the code. That is more then enough for anything
the simulator is doing.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:26 +03:00
Oded Gabbay
52a1ae115a habanalabs: add debug print when rejecting CS
When rejecting CS because of too many in-flight CS, print a debug message
about it as it useful to know when the user is debugging (it indicates a
back-pressure from the driver as the device is not fast enough to consume
the CS)

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:26 +03:00
Oded Gabbay
68b8819daf habanalabs: remove write_open_cnt property
This property has attempted to show the number of open file descriptors on
the device. This was a stupid and futile attempt so remove this property
completely.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:25 +03:00
Srinivas Kandagatla
cf61860e6b misc: fastrpc: free dma buf scatter list
dma buf scatter list is never freed, free it!

Orignally detected by kmemleak:
  backtrace:
    [<ffffff80088b7658>] kmemleak_alloc+0x50/0x84
    [<ffffff8008373284>] sg_kmalloc+0x38/0x60
    [<ffffff8008373144>] __sg_alloc_table+0x60/0x110
    [<ffffff800837321c>] sg_alloc_table+0x28/0x58
    [<ffffff800837336c>] __sg_alloc_table_from_pages+0xc0/0x1ac
    [<ffffff800837346c>] sg_alloc_table_from_pages+0x14/0x1c
    [<ffffff8008097a3c>] __iommu_get_sgtable+0x5c/0x8c
    [<ffffff800850a1d0>] fastrpc_dma_buf_attach+0x84/0xf8
    [<ffffff80085114bc>] dma_buf_attach+0x70/0xc8
    [<ffffff8008509efc>] fastrpc_map_create+0xf8/0x1e8
    [<ffffff80085086f4>] fastrpc_device_ioctl+0x508/0x900
    [<ffffff80082428c8>] compat_SyS_ioctl+0x128/0x200
    [<ffffff80080832c4>] el0_svc_naked+0x34/0x38
    [<ffffffffffffffff>] 0xffffffffffffffff

Reported-by: Mayank Chopra <mak.chopra@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20190829092926.12037-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:35:11 +02:00
Srinivas Kandagatla
5672ff4dc3 misc: fastrpc: fix double refcounting on dmabuf
dma buf refcount has to be done by the driver which is going to use the fd.
This driver already does refcount on the dmabuf fd if its actively using it
but also does an additional refcounting via extra ioctl.
This additional refcount can lead to memory leak in cases where the
applications fail to call the ioctl to decrement the refcount.

So remove this extra refcount in the ioctl

More info of dma buf usage at drivers/dma-buf/dma-buf.c

Reported-by: Mayank Chopra <mak.chopra@codeaurora.org>
Reported-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Tested-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20190829092926.12037-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:35:11 +02:00
Jorge Ramirez-Ortiz
15fe27f316 misc: fastrpc: remove unused definition
Remove unused INIT_MEMLEN_MAX define.

Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Abhinav Asati <asatiabhi@codeaurora.org>
Signed-off-by: Vamsi Singamsetty <vamssi@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20190829092926.12037-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:35:11 +02:00
Bjorn Andersson
2e369878bd misc: fastrpc: Don't reference rpmsg_device after remove
As fastrpc_rpmsg_remove() returns the rpdev of the channel context is no
longer a valid object, so ensure to update the channel context to no
longer reference the old object and guard in the invoke code path
against dereferencing it.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Mayank Chopra <mak.chopra@codeaurora.org>
Signed-off-by: Abhinav Asati <asatiabhi@codeaurora.org>
Signed-off-by: Vamsi Singamsetty <vamssi@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20190829092926.12037-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:35:10 +02:00
Bjorn Andersson
278d56f970 misc: fastrpc: Reference count channel context
The channel context is referenced from the fastrpc user and might as
user space holds the file descriptor open outlive the fastrpc device,
which is removed when the remote processor is shutting down.

Reference count the channel context in order to retain this object until
all references has been relinquished.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Mayank Chopra <mak.chopra@codeaurora.org>
Signed-off-by: Abhinav Asati <asatiabhi@codeaurora.org>
Signed-off-by: Vamsi Singamsetty <vamssi@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20190829092926.12037-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 13:35:10 +02:00
Jean Delvare
c165d8947b eeprom: Deprecate the legacy eeprom driver
Time has come to get rid of the old eeprom driver. The at24 driver
should be used instead. So mark the eeprom driver as deprecated and
give users some time to migrate. Then we can remove the legacy
eeprom driver completely.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190902104838.058725c2@endymion
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 09:57:35 +02:00
Greg Kroah-Hartman
99097a214b Merge 5.3-rc7 into char-misc-next
We need the fixes in here as well for testing and merges

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-02 19:30:09 +02:00
Nadav Amit
468e0ffac8 vmw_balloon: Fix offline page marking with compaction
The compaction code already marks pages as offline when it enqueues
pages in the ballooned page list, and removes the mapping when the pages
are removed from the list. VMware balloon also updates the flags,
instead of letting the balloon-compaction logic handle it, which causes
the assertion VM_BUG_ON_PAGE(!PageOffline(page)) to fire, when
__ClearPageOffline is called the second time. This causes the following
crash.

[  487.104520] kernel BUG at include/linux/page-flags.h:749!
[  487.106364] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[  487.107681] CPU: 7 PID: 1106 Comm: kworker/7:3 Not tainted 5.3.0-rc5balloon #227
[  487.109196] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[  487.111452] Workqueue: events_freezable vmballoon_work [vmw_balloon]
[  487.112779] RIP: 0010:vmballoon_release_page_list+0xaa/0x100 [vmw_balloon]
[  487.114200] Code: fe 48 c1 e7 06 4c 01 c7 8b 47 30 41 89 c1 41 81 e1 00 01 00 f0 41 81 f9 00 00 00 f0 74 d3 48 c7 c6 08 a1 a1 c0 e8 06 0d e7 ea <0f> 0b 44 89 f6 4c 89 c7 e8 49 9c e9 ea 49 8d 75 08 49 8b 45 08 4d
[  487.118033] RSP: 0018:ffffb82f012bbc98 EFLAGS: 00010246
[  487.119135] RAX: 0000000000000037 RBX: 0000000000000001 RCX: 0000000000000006
[  487.120601] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a85b6bd7620
[  487.122071] RBP: ffffb82f012bbcc0 R08: 0000000000000001 R09: 0000000000000000
[  487.123536] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb82f012bbd00
[  487.125002] R13: ffffe97f4598d9c0 R14: 0000000000000000 R15: ffffb82f012bbd34
[  487.126463] FS:  0000000000000000(0000) GS:ffff9a85b6bc0000(0000) knlGS:0000000000000000
[  487.128110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  487.129316] CR2: 00007ffe6e413ea0 CR3: 0000000230b18001 CR4: 00000000003606e0
[  487.130812] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  487.132283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  487.133749] Call Trace:
[  487.134333]  vmballoon_deflate+0x22c/0x390 [vmw_balloon]
[  487.135468]  vmballoon_work+0x6e7/0x913 [vmw_balloon]
[  487.136711]  ? process_one_work+0x21a/0x5e0
[  487.138581]  process_one_work+0x298/0x5e0
[  487.139926]  ? vmballoon_migratepage+0x310/0x310 [vmw_balloon]
[  487.141610]  ? process_one_work+0x298/0x5e0
[  487.143053]  worker_thread+0x41/0x400
[  487.144389]  kthread+0x12b/0x150
[  487.145582]  ? process_one_work+0x5e0/0x5e0
[  487.146937]  ? kthread_create_on_node+0x60/0x60
[  487.148637]  ret_from_fork+0x3a/0x50

Fix it by updating the PageOffline indication only when a 2MB page is
enqueued and dequeued. The 4KB pages will be handled correctly by the
balloon compaction logic.

Fixes: 83a8afa72e ("vmw_balloon: Compaction support")
Cc: David Hildenbrand <david@redhat.com>
Reported-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20190820160121.452-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28 22:57:07 +02:00
Nadav Amit
ba03a9bbd1 VMCI: Release resource if the work is already queued
Francois reported that VMware balloon gets stuck after a balloon reset,
when the VMCI doorbell is removed. A similar error can occur when the
balloon driver is removed with the following splat:

[ 1088.622000] INFO: task modprobe:3565 blocked for more than 120 seconds.
[ 1088.622035]       Tainted: G        W         5.2.0 #4
[ 1088.622087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.622205] modprobe        D    0  3565   1450 0x00000000
[ 1088.622210] Call Trace:
[ 1088.622246]  __schedule+0x2a8/0x690
[ 1088.622248]  schedule+0x2d/0x90
[ 1088.622250]  schedule_timeout+0x1d3/0x2f0
[ 1088.622252]  wait_for_completion+0xba/0x140
[ 1088.622320]  ? wake_up_q+0x80/0x80
[ 1088.622370]  vmci_resource_remove+0xb9/0xc0 [vmw_vmci]
[ 1088.622373]  vmci_doorbell_destroy+0x9e/0xd0 [vmw_vmci]
[ 1088.622379]  vmballoon_vmci_cleanup+0x6e/0xf0 [vmw_balloon]
[ 1088.622381]  vmballoon_exit+0x18/0xcc8 [vmw_balloon]
[ 1088.622394]  __x64_sys_delete_module+0x146/0x280
[ 1088.622408]  do_syscall_64+0x5a/0x130
[ 1088.622410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1088.622415] RIP: 0033:0x7f54f62791b7
[ 1088.622421] Code: Bad RIP value.
[ 1088.622421] RSP: 002b:00007fff2a949008 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1088.622426] RAX: ffffffffffffffda RBX: 000055dff8b55d00 RCX: 00007f54f62791b7
[ 1088.622426] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055dff8b55d68
[ 1088.622427] RBP: 000055dff8b55d00 R08: 00007fff2a947fb1 R09: 0000000000000000
[ 1088.622427] R10: 00007f54f62f5cc0 R11: 0000000000000206 R12: 000055dff8b55d68
[ 1088.622428] R13: 0000000000000001 R14: 000055dff8b55d68 R15: 00007fff2a94a3f0

The cause for the bug is that when the "delayed" doorbell is invoked, it
takes a reference on the doorbell entry and schedules work that is
supposed to run the appropriate code and drop the doorbell entry
reference. The code ignores the fact that if the work is already queued,
it will not be scheduled to run one more time. As a result one of the
references would not be dropped. When the code waits for the reference
to get to zero, during balloon reset or module removal, it gets stuck.

Fix it. Drop the reference if schedule_work() indicates that the work is
already queued.

Note that this bug got more apparent (or apparent at all) due to
commit ce664331b2 ("vmw_balloon: VMCI_DOORBELL_SET does not check status").

Fixes: 83e2ec765b ("VMCI: doorbell implementation.")
Reported-by: Francois Rigault <rigault.francois@gmail.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Adit Ranadive <aditr@vmware.com>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Vishnu DASA <vdasa@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Link: https://lore.kernel.org/r/20190820202638.49003-1-namit@vmware.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28 22:57:07 +02:00
Greg Kroah-Hartman
d4e34999a7 Updates to LKDTM for -next
- split WARNING into two tests: with message and without
 - add prototype-granularity forward CFI test
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl1jCSAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJp8hD/4tvV3hXGl7tbkM46crbfeAUVNI
 /9VsNVKgUldT03Wz4Yqcr1a+8uLp2T2ifNFfz0l6RIDLbO0IJZNFJ12h9NQRyhaR
 p4aM7gNBUrmkhw7Jamiu0b0xuFNYKTtrOexBpLLqXRQe2vUHV0/w8mXbnOI0ciuY
 livnEG2xwQPB+ez84ro99uyCW37C3wVqchSG1XR6v4/tPoPIBKjPXT0K7fDLCJY0
 Jh/4Ix4OvRO+D0+sqW0FS4gHzyFUiC/9qhU6OX/BNK7rb8YXOfwX4BBokS4V1Pim
 7/ZVQN1ivATE/dvHzUE+B/+Gyt54RoKyYaDNTidnXEm1b3IPA31JzyefCEWlKRkh
 9FWgMNNWcgJWtMB9Gn8LVtvRfIl1AIKjZl6tdMBgfnzMqQNguHsJioK3AlFf/Zb8
 /fF//l7lcwGCfByKZRYxf/wpLsjI4FDZSDkK8qsSohNq2KHPoxrboTeWVwbtLLFI
 pEoyb03H31VrfRDQFui8j+ki5cVW69ciJhLnxL7HpQ4Tt7jYx976SYlf/n/MBg+1
 c5xx7p6tSmwNrNS4KnMBmFvTgW7lBRrPxKGlLnhJ5XkkIv7+XLyTrDAioDalqcNS
 kODGzwxPklpSuu0DEXtmXCky6actCuvQH7RiMAh0emJ0vqXGsWEBF9aesLdrN9U1
 9Q3yTq23//qWVq0Hcg==
 =yJnT
 -----END PGP SIGNATURE-----

Merge tag 'lkdtm-next' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into char-misc-next

Kees writes:

Updates to LKDTM for -next

- split WARNING into two tests: with message and without
- add prototype-granularity forward CFI test

* tag 'lkdtm-next' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm: Split WARNING into separate tests
  lkdtm: Add Control Flow Integrity test
2019-08-28 22:37:55 +02:00
Raul E Rangel
b9bc7b8b1e lkdtm/bugs: fix build error in lkdtm_EXHAUST_STACK
lkdtm/bugs.c:94:2: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=]
  pr_info("Calling function with %d frame size to depth %d ...\n",
  ^
THREAD_SIZE is defined as a unsigned long, cast CONFIG_FRAME_WARN to
unsigned long as well.

Fixes: 24cccab42c ("lkdtm/bugs: Adjust recursion test to avoid elision")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Raul E Rangel <rrangel@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20190827173619.170065-1-rrangel@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28 22:33:44 +02:00
Tomas Winkler
587f174077 mei: me: add Tiger Lake point LP device ID
Add Tiger Lake Point device ID for TGP LP.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190819103210.32748-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28 22:30:15 +02:00
Colin Ian King
3b420aeb75 misc: xilinx_sdfec: fix spelling mistake: "Schdule" -> "Schedule"
There is a spelling mistake in a dev_dbg message, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Link: https://lore.kernel.org/r/20190819094137.390-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22 14:28:10 -07:00
Dan Carpenter
6123f1fe53 misc: xilinx_sdfec: Prevent integer overflow in xsdfec_table_write()
The checking here needs to handle integer overflows because "offset" and
"len" come from the user.

Fixes: 20ec628e80 ("misc: xilinx_sdfec: Add ability to configure LDPC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Link: https://lore.kernel.org/r/20190821071122.GD26957@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22 14:28:10 -07:00
Dan Carpenter
56a635c0ec misc: xilinx_sdfec: Prevent a divide by zero in xsdfec_reg0_write()
The "psize" value comes from the user so we need to verify that it's
non-zero before we check if "n % psize" or it will crash.

Fixes: 20ec628e80 ("misc: xilinx_sdfec: Add ability to configure LDPC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/20190821070953.GC26957@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22 14:28:10 -07:00
Dan Carpenter
129c3b082c misc: xilinx_sdfec: Return -EFAULT if copy_from_user() fails
The copy_from_user() function returns the number of bytes remaining to
be copied but we want to return -EFAULT to the user.

Fixes: 20ec628e80 ("misc: xilinx_sdfec: Add ability to configure LDPC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Link: https://lore.kernel.org/r/20190822083105.GI3964@kadam
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22 14:28:10 -07:00
Dan Carpenter
dac4f1964a misc: xilinx_sdfec: Fix a couple small information leaks
These structs have holes in them so we end up disclosing a few bytes of
uninitialized stack data.

drivers/misc/xilinx_sdfec.c:305 xsdfec_get_status() warn: check that 'status' doesn't leak information (struct has a hole after 'activity')
drivers/misc/xilinx_sdfec.c:449 xsdfec_get_turbo() warn: check that 'turbo_params' doesn't leak information (struct has a hole after 'scale')

We need to zero out the holes with memset().

Fixes: 6bd6a690c2 ("misc: xilinx_sdfec: Add stats & status ioctls")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/20190821070606.GA26957@mwanda
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-22 14:28:09 -07:00
Kees Cook
1ee170ea3f lkdtm: Split WARNING into separate tests
There are three paths through the kernel code exception logging:

- BUG (no configurable printk message)
- WARN_ON (no configurable printk message)
- WARN (configurable printk message)

LKDTM was not testing WARN_ON(). This is needed to evaluate the placement
of the "cut here" line, which needs special handling in each of the
three exceptions (and between architectures that implement instruction
exceptions to implement the code exceptions).

Signed-off-by: Kees Cook <keescook@chromium.org>
2019-08-19 11:13:21 -07:00
Greg Kroah-Hartman
e70c971d7d Merge 5.3-rc5 into char-misc-next
We need the char/misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19 07:11:53 +02:00
Fuqian Huang
672a749b4d sgi-xpc: Use GFP_ATOMIC for kmalloc in atomic context.
xpc_send_activate_IRQ_uv is called from
 <-xpc_send_activate_IRQ_part_uv
 <-xpc_indicate_partition_disengaged_uv
  (xpc_arch_ops.indicate_partition_disengaged)
 <-xpc_die_deactivate
 <-xpc_system_die

xpc_system_die is registered by atomic_notifier_chain_register,
which indicates xpc_system_die may be called in atomic context.
So the kmalloc in xpc_send_activate_IRQ_uv may be in atomic context.

Use GFP_ATOMIC instead of GFP_KERNEL in kmalloc.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Link: https://lore.kernel.org/r/20190805125625.24963-1-huangfq.daxian@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15 18:20:32 +02:00
Randy Dunlap
01fd150f4a misc: xilinx-sdfec: fix dependency and build error
lib/devres.c, which implements devm_ioremap_resource(), is only built
when CONFIG_HAS_IOMEM is set/enabled, so XILINX_SDFEC should depend
on HAS_IOMEM.  Fixes this build error (as seen on UML builds):

ERROR: "devm_ioremap_resource" [drivers/misc/xilinx_sdfec.ko] undefined!

Fixes: 76d83e1c32 ("misc: xilinx-sdfec: add core driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Derek Kiernan <derek.kiernan@xilinx.com>
Cc: Dragan Cvetic <dragan.cvetic@xilinx.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/f9004be5-9925-327b-3ec2-6506e46fe565@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15 18:10:25 +02:00
Nishka Dasgupta
cd010d9b98 sgi-xp: xpc_uv: Make structure xpc_arch_ops_uv constant
The static xpc_arch_operations structure xpc_arch_ops_uv is only copied
into the structure xpc_arch_ops, after which it is never modified; nor
is it used in any other way. Hence it can be declared as a constant to
prevent unintended modifications of its fields.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Acked-by: Robin Holt <robinmholt@gmail.com>
Link: https://lore.kernel.org/r/20190808080422.16503-1-nishkadg.linux@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-15 18:03:38 +02:00