qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.
So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.
This deadlock has actually been observed in the wild when the node
is under memory pressure.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") caused statements to leak pass the header guard.
Fix this.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
PCI core will initialize device MSI/MSI-X capability in
pci_msi_init_pci_dev(). So device drivers should use
pci_dev->msi_cap/msix_cap to determine whether a device supports
MSI/MSI-X instead of using pci_find_capability(pci_dev,
PCI_CAP_ID_MSI/MSIX). Access to PCIe device config space again will
consume more time.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
struct pci_driver qib_driver is only used in qib_init.c. Remove it
from qib.h and make it static in qib_init.c.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. The code accepts chunks of messages, and splits the chunk into
packets when converting packets into sdma queue entries. Adjacent
packets will use user buffer pages smartly to avoid pinning the
same page multiple times.
2. Instead of discarding all the work when SDMA queue is full, the
work is saved in a pending queue. Whenever there are enough SDMA
queue free entries, pending queue is directly put onto SDMA queue.
3. An interrupt handler is used to progress this pending queue.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Fixed up sparse warnings. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 0b3ddf380c ("Log all SDMA errors unconditionally") missed
part of the patch.
This also corrects a format warning when dma_addr_t is 32 bits
on a 64 bit system.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- AF_IB (native IB addressing) for CMA from Sean Hefty
- New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- Resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- Other small changes and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
GBYjHWMUpZXIFHQaH7am
=nZh8
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- AF_IB (native IB addressing) for CMA from Sean Hefty
- new mlx5 driver for Mellanox Connect-IB adapters (including post
merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- other small changes and fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
mlx5: Return -EFAULT instead of -EPERM
IB/qib: Log all SDMA errors unconditionally
IB/qib: Fix module-level leak
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
mlx5_core: Fixes for sparse warnings
IB/mlx5: Make profile[] static in main.c
mlx5: Fix parameter type of health_handler_t
mlx5: Add driver for Mellanox Connect-IB adapters
IB/core: Add reserved values to enums for low-level driver use
IB/srp: Bump driver version and release date
IB/srp: Make HCA completion vector configurable
IB/srp: Maintain a single connection per I_T nexus
IB/srp: Fail I/O fast if target offline
IB/srp: Skip host settle delay
IB/srp: Avoid skipping srp_reset_host() after a transport error
IB/srp: Fix remove_one crash due to resource exhaustion
IB/qib: New transmitter tunning settings for Dell 1.1 backplane
IB/core: Fix error return code in add_port()
...
This patch adds code to log SDMA errors for supportability purposes.
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The vzalloc()'ed field physshadow is leaked on module unload.
This patch adds vfree after the sibling page shadow is freed.
Reported-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Calling dev_set_name with a single paramter causes it to be handled as a
format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Dell blade chassis got an updated backplane which requires new
transmitter tuning settings.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This adds a seq_file iterator for reporting the QP hash table when the
qp_stats file is read.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds a debugfs stats interface for per kernel contexts
packet counts.
The code uses the opcode stats count and eliminates the counter in the
context.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This fix changes the opcode relative counters for receive to per
context.
Profiling has shown that when mulitple contexts are being used there
is a lot of cache activity associated with these counters.
The code formerly kept these counters per port, but only provided the
interface to read per HCA. This patch converts the read of counters
to per HCA and adds the debugfs hooks to be able to read the file as a
sequence of opcodes.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current workqueue implemention has the following performance
deficiencies on QDR HCAs:
- The CQ call backs tend to run on the CPUs processing the
receive queues
- The single thread queue isn't optimal for multiple HCAs
This patch adds a dedicated per HCA bound thread to process CQ callbacks.
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver currently selects a HCA based on the algorithm that PSM
chooses, contexts within a HCA or across. The HCA can also be chosen
by the user. Either way, this patch assigns a CPU on the NUMA node
local to the selected HCA. This patch also tries to select the HCA
closest to the NUMA node of the CPU assigned via taskset to PSM
process. If this HCA is unusable then another unit is selected based
on the algorithm that is currently enforced or selected by PSM - round
robin context selection 'within' or 'across' HCA's.
Fixed a bug wherein contexts are setup on the NUMA node on which the
processes are opened (setup_ctxt()) and not on the NUMA node that the
driver recommends the CPU on.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds context relative numa affinity conditioned on the
module parameter numa_aware. The qib_ctxtdata has an additional
node_id member and qib_create_ctxtdata() has an addition node_id
parameter.
The allocations within the hdr queue and eager queue setup routines
now take this additional member and adjust allocations as necesary.
PSM will pass the either current numa node or the node closest to the
HCA depending on numa_aware. Verbs will always use the node closest to
the HCA.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
External PSM repositories have advanced the minor number for a variety
of reasons. The driver needs to increase to avoid warnings.
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Follow Documentation/RCU/rcuref.txt guidance in removing
atomic_inc_not_zero() from QP RCU implementation.
This patch also removes an unneeded synchronize_rcu() in the add path.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds DCA cache warming for systems that support DCA.
The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
build due to undefined struct irq_affinity_notify. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
- XRC transport fixes
- Fix DHCP on IPoIB
- mlx4 preparations for flow steering
- iSER fixes
- miscellaneous other fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRisChAAoJEENa44ZhAt0hHLoP/iX5MxtHJ3X1u5KcARX/7nci
CCH/VnD172re/KavCCg7zkbZQpS4jHCtW/CzLUCSPqBGOaj78HFTUkB3ragvUW7m
ndERwplF8DP/i0x7Kk7Wau2a4RdlH0lqwucOjqTyQnbIdxknkz6w3Jcsb9Ic2lzx
up0T0HaHtTxdVF6lXOB5QOIpUGg3l0Yu4euX2BA61WDoZj+VIYhgeWZewq0iV0D1
rLtarJ+Or7mdwu2rNcDHgrD0lhF4SCBd3rx4lbc4F68Cr8JUz0Xe7liPLNskeLhW
f3NEm3gmkYp9YI1otGsA0X/CyV6wnRk4mT8JMlOb2WNzeq2V13Z54/9ZyF5/gFD2
JgzkQB9Ibf7EmTwXWd6+0+FA40Q6dNvnRnhddRM255dvDVw7nxUr2UzYH4Re/Z9K
rNFjkvix2YUwEmoPjitWocz2kj2reDMqjtiVDmdGy1YbtnicH5GtkQsWkoPg8ON9
m5jORUdzydTD+yBJwTiFP1EuFoG3TdfoZ7zHMJwWy/u8i308xD6WPGms9MTdjh8j
7gjz2TCKr+vpuVRh/p6esCPPOTSsSeWDeowy7Sgpdf3qoqAImXsWXVrl2kXLhtyl
1VIgHU3ztm7oqwmy0gQ/zVCo4CLLdsif2zmEIDpxJPnWaSq+D9LJdyLTcfdBjhQ8
9SjUafe4msT1pIjNb7ND
=1ojD
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- XRC transport fixes
- Fix DHCP on IPoIB
- mlx4 preparations for flow steering
- iSER fixes
- miscellaneous other fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
IB/iser: Add support for iser CM REQ additional info
IB/iser: Return error to upper layers on EAGAIN registration failures
IB/iser: Move informational messages from error to info level
IB/iser: Add module version
mlx4_core: Expose a few helpers to fill DMFS HW strucutures
mlx4_core: Directly expose fields of DMFS HW rule control segment
mlx4_core: Change a few DMFS fields names to match firmare spec
mlx4: Match DMFS promiscuous field names to firmware spec
mlx4_core: Move DMFS HW structs to common header file
IB/mlx4: Set link type for RAW PACKET QPs in the QP context
IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
RDMA/iwcm: Don't touch cmid after dropping reference
IB/qib: Correct qib_verbs_register_sysfs() error handling
IB/ipath: Correct ipath_verbs_register_sysfs() error handling
RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
SRPT: Fix odd use of WARN_ON()
IPoIB: Fix ipoib_hard_header() return value
RDMA: Rename random32() to prandom_u32()
RDMA/cxgb3: Fix uninitialized variable
...
qib_verbs_register_sysfs() never cleans up from a failure.
Additionally, the caller of qib_verbs_register_sysfs() doesn't
return the correct "ret" value.
This patch resolves both of those issues.
Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit e2eed58b4f ("IB/qib: change QLogic to Intel") moved a firmware
file potentially breaking the ABI.
This patch reverts that aspect of the fix as well as reverting the
firmware name as used in qib.
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These changes modify the qib driver as part of acquiring
the InfiniBand assets of QLogic.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Marciniszyn <infinipath@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
remove_qp() can execute concurrently with a qib_lookup_qpn() on
another CPU, which in of itself, is ok, given the RCU locking.
The issue is that remove_qp() NULLs out the qp->next field so that a
qib_lookup_qpn() might fail to find a qp if it occurs after the one
that is being deleted. This is a momentary issue and subsequent
qib_lookup_qpn() calls would find the qp's since the search restarts
from the bucket head. At scale, the issue might causes dropped
packets and unnecessary retransmissions.
The fix just deletes the qp->next NULL assignment to prevent the
remove_qp() from hiding qp's from qib_lookup_qpn().
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 1fb9fed6d4 ("IB/qib: Fix QP RCU sparse warning") broke QP
hash list deletion in qp_remove() badly.
This patch restores the former for loop behavior, while still fixing
the sparse warnings.
Cc: <stable@vger.kernel.org>
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing (Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/
RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo
dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i
pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB
5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b
2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v
kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk
dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW
4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc
x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l
KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z
SxkILwY/MFpuCFteKE1t
=haBu
-----END PGP SIGNATURE-----
Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
- Clear host bridge resource info to avoid issue when releasing
(Yinghai Lu)
- Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
- Use standard list ops for acpi_pci_drivers (Jiang Liu)
Device hotplug
- Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang
Liu)
- Remove fakephp driver (Bjorn Helgaas)
- Fix VGA ref count in hotplug remove path (Yinghai Lu)
- Allow acpiphp to handle PCIe ports without native hotplug (Jiang
Liu)
- Implement resume regardless of pciehp_force param (Oliver Neukum)
- Make pci_fixup_irqs() work after init (Thierry Reding)
Miscellaneous
- Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
- Factor out PCI Express Capability accessors (Jiang Liu)
- Add pcibios_window_alignment() so powerpc EEH can use generic
resource assignment (Gavin Shan)
- Make pci_error_handlers const (Stephen Hemminger)
- Cleanup drivers/pci/remove.c (Bjorn Helgaas)
- Improve Vendor-Specific Extended Capability support (Bjorn
Helgaas)
- Use standard list ops for bus->devices (Bjorn Helgaas)
- Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
- Reassign invalid bus number ranges (Intel DP43BF workaround)
(Yinghai Lu)"
* tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits)
PCI: acpiphp: Handle PCIe ports without native hotplug capability
PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots
PCI/ACPI: Protect acpi_pci_roots list with mutex
PCI/ACPI: Use acpi_pci_root info rather than looking it up again
PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface
PCI/ACPI: Protect acpi_pci_drivers list with mutex
PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges
PCI/ACPI: Use normal list for struct acpi_pci_driver
PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots
PCI: Fix default vga ref_count
ia64/PCI: Clear host bridge aperture struct resource
x86/PCI: Clear host bridge aperture struct resource
PCI: Stop all children first, before removing all children
Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()"
PCI: Provide a default pcibios_update_irq()
PCI: Discard __init annotations for pci_fixup_irqs() and related functions
PCI: Use correct type when freeing bus resource list
PCI: Check P2P bridge for invalid secondary/subordinate range
PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes
xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
...
Commit 8aac4cc3a9 ("IB/qib: RCU locking for MR validation") introduced
a bug that broke user post sends. The proper validation of the MR
was lost in the patch.
This patch corrects that validation.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Commit 3236b2d469 ("IB/qib: MADs with misset M_Keys should return
failure") introduced a return code assignment that unfortunately
introduced an unconditional exit for the routine due to missing braces.
This patch adds the braces to correct the original patch.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* commit 'v3.6-rc5': (1098 commits)
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
powerpc/powernv: Always go into nap mode when CPU is offline
powerpc: Give hypervisor decrementer interrupts their own handler
powerpc/vphn: Fix arch_update_cpu_topology() return value
ARM: gemini: fix the gemini build
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/rapidio/devices/tsi721.c
* pci/stephen-const:
make drivers with pci error handlers const
scsi: make pci error handlers const
netdev: make pci_error_handlers const
PCI: Make pci_error_handlers const
Covers the rest of the uses of pci error handler.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Convert a 0 error return code to a negative one, as returned elsewhere
in the function.
A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@
(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
when != ret = e3
*if (x == NULL || ...)
{
... when != ret = e4
* return ret;
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Correct spelling typos in comments in drivers/infiniband.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") tries to store the value 1984 in a u8, which leads to
truncation. Fix this by making the member big enough.
This bug was detected by a smatch warning.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>