Commit Graph

9996 Commits

Author SHA1 Message Date
Linus Torvalds
237f83dfbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Some highlights from this development cycle:

   1) Big refactoring of ipv6 route and neigh handling to support
      nexthop objects configurable as units from userspace. From David
      Ahern.

   2) Convert explored_states in BPF verifier into a hash table,
      significantly decreased state held for programs with bpf2bpf
      calls, from Alexei Starovoitov.

   3) Implement bpf_send_signal() helper, from Yonghong Song.

   4) Various classifier enhancements to mvpp2 driver, from Maxime
      Chevallier.

   5) Add aRFS support to hns3 driver, from Jian Shen.

   6) Fix use after free in inet frags by allocating fqdirs dynamically
      and reworking how rhashtable dismantle occurs, from Eric Dumazet.

   7) Add act_ctinfo packet classifier action, from Kevin
      Darbyshire-Bryant.

   8) Add TFO key backup infrastructure, from Jason Baron.

   9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

  10) Add devlink notifications for flash update status to mlxsw driver,
      from Jiri Pirko.

  11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

  12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

  13) Various enhancements to ipv6 flow label handling, from Eric
      Dumazet and Willem de Bruijn.

  14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
      der Merwe, and others.

  15) Various improvements to axienet driver including converting it to
      phylink, from Robert Hancock.

  16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

  17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
      Radulescu.

  18) Add devlink health reporting to mlx5, from Moshe Shemesh.

  19) Convert stmmac over to phylink, from Jose Abreu.

  20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
      Shalom Toledo.

  21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

  22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

  23) Track spill/fill of constants in BPF verifier, from Alexei
      Starovoitov.

  24) Support bounded loops in BPF, from Alexei Starovoitov.

  25) Various page_pool API fixes and improvements, from Jesper Dangaard
      Brouer.

  26) Just like ipv4, support ref-countless ipv6 route handling. From
      Wei Wang.

  27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

  28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

  29) Add flower GRE encap/decap support to nfp driver, from Pieter
      Jansen van Vuuren.

  30) Protect against stack overflow when using act_mirred, from John
      Hurley.

  31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

  32) Use page_pool API in netsec driver, Ilias Apalodimas.

  33) Add Google gve network driver, from Catherine Sullivan.

  34) More indirect call avoidance, from Paolo Abeni.

  35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

  36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

  37) Add MPLS manipulation actions to TC, from John Hurley.

  38) Add sending a packet to connection tracking from TC actions, and
      then allow flower classifier matching on conntrack state. From
      Paul Blakey.

  39) Netfilter hw offload support, from Pablo Neira Ayuso"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
  net/mlx5e: Return in default case statement in tx_post_resync_params
  mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
  net: dsa: add support for BRIDGE_MROUTER attribute
  pkt_sched: Include const.h
  net: netsec: remove static declaration for netsec_set_tx_de()
  net: netsec: remove superfluous if statement
  netfilter: nf_tables: add hardware offload support
  net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
  net: flow_offload: add flow_block_cb_is_busy() and use it
  net: sched: remove tcf block API
  drivers: net: use flow block API
  net: sched: use flow block API
  net: flow_offload: add flow_block_cb_{priv, incref, decref}()
  net: flow_offload: add list handling functions
  net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
  net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
  net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
  net: flow_offload: add flow_block_cb_setup_simple()
  net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
  net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
  ...
2019-07-11 10:55:49 -07:00
Linus Torvalds
8b68150883 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Bug fixes, code clean up, and new features:

   - IMA policy rules can be defined in terms of LSM labels, making the
     IMA policy dependent on LSM policy label changes, in particular LSM
     label deletions. The new environment, in which IMA-appraisal is
     being used, frequently updates the LSM policy and permits LSM label
     deletions.

   - Prevent an mmap'ed shared file opened for write from also being
     mmap'ed execute. In the long term, making this and other similar
     changes at the VFS layer would be preferable.

   - The IMA per policy rule template format support is needed for a
     couple of new/proposed features (eg. kexec boot command line
     measurement, appended signatures, and VFS provided file hashes).

   - Other than the "boot-aggregate" record in the IMA measuremeent
     list, all other measurements are of file data. Measuring and
     storing the kexec boot command line in the IMA measurement list is
     the first buffer based measurement included in the measurement
     list"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: Introduce struct evm_xattr
  ima: Update MAX_TEMPLATE_NAME_LEN to fit largest reasonable definition
  KEXEC: Call ima_kexec_cmdline to measure the boot command line args
  IMA: Define a new template field buf
  IMA: Define a new hook to measure the kexec boot command line arguments
  IMA: support for per policy rule template formats
  integrity: Fix __integrity_init_keyring() section mismatch
  ima: Use designated initializers for struct ima_event_data
  ima: use the lsm policy update notifier
  LSM: switch to blocking policy update notifiers
  x86/ima: fix the Kconfig dependency for IMA_ARCH_POLICY
  ima: Make arch_policy_entry static
  ima: prevent a file already mmap'ed write to be mmap'ed execute
  x86/ima: check EFI SetupMode too
2019-07-08 20:28:59 -07:00
Linus Torvalds
dad1c12ed8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Remove the unused per rq load array and all its infrastructure, by
   Dietmar Eggemann.

 - Add utilization clamping support by Patrick Bellasi. This is a
   refinement of the energy aware scheduling framework with support for
   boosting of interactive and capping of background workloads: to make
   sure critical GUI threads get maximum frequency ASAP, and to make
   sure background processing doesn't unnecessarily move to cpufreq
   governor to higher frequencies and less energy efficient CPU modes.

 - Add the bare minimum of tracepoints required for LISA EAS regression
   testing, by Qais Yousef - which allows automated testing of various
   power management features, including energy aware scheduling.

 - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
   kernel used to modify the scheduler's CPU affinity logic such as
   migrate_disable() - introduce the task->cpus_ptr value instead of
   taking the address of &task->cpus_allowed directly - by Sebastian
   Andrzej Siewior.

 - Misc optimizations, fixes, cleanups and small enhancements - see the
   Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  sched/uclamp: Add uclamp support to energy_compute()
  sched/uclamp: Add uclamp_util_with()
  sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
  sched/uclamp: Set default clamps for RT tasks
  sched/uclamp: Reset uclamp values on RESET_ON_FORK
  sched/uclamp: Extend sched_setattr() to support utilization clamping
  sched/core: Allow sched_setattr() to use the current policy
  sched/uclamp: Add system default clamps
  sched/uclamp: Enforce last task's UCLAMP_MAX
  sched/uclamp: Add bucket local max tracking
  sched/uclamp: Add CPU's clamp buckets refcounting
  sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
  sched/debug: Export the newly added tracepoints
  sched/debug: Add sched_overutilized tracepoint
  sched/debug: Add new tracepoint to track PELT at se level
  sched/debug: Add new tracepoints to track PELT at rq level
  sched/debug: Add a new sched_trace_*() helper functions
  sched/autogroup: Make autogroup_path() always available
  sched/wait: Deduplicate code with do-while
  sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
  ...
2019-07-08 16:39:53 -07:00
Linus Torvalds
e192832869 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - rwsem scalability improvements, phase #2, by Waiman Long, which are
     rather impressive:

       "On a 2-socket 40-core 80-thread Skylake system with 40 reader
        and writer locking threads, the min/mean/max locking operations
        done in a 5-second testing window before the patchset were:

         40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
         40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

        After the patchset, they became:

         40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
         40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

     There's a lot of changes to the locking implementation that makes
     it similar to qrwlock, including owner handoff for more fair
     locking.

     Another microbenchmark shows how across the spectrum the
     improvements are:

       "With a locking microbenchmark running on 5.1 based kernel, the
        total locking rates (in kops/s) on a 2-socket Skylake system
        with equal numbers of readers and writers (mixed) before and
        after this patchset were:

        # of Threads   Before Patch      After Patch
        ------------   ------------      -----------
             2            2,618             4,193
             4            1,202             3,726
             8              802             3,622
            16              729             3,359
            32              319             2,826
            64              102             2,744"

     The changes are extensive and the patch-set has been through
     several iterations addressing various locking workloads. There
     might be more regressions, but unless they are pathological I
     believe we want to use this new implementation as the baseline
     going forward.

   - jump-label optimizations by Daniel Bristot de Oliveira: the primary
     motivation was to remove IPI disturbance of isolated RT-workload
     CPUs, which resulted in the implementation of batched jump-label
     updates. Beyond the improvement of the real-time characteristics
     kernel, in one test this patchset improved static key update
     overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
     as well.

   - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
     ~10 years of atomic64_t existence the various types used by the
     APIs only had to be self-consistent within each architecture -
     which means they became wildly inconsistent across architectures.
     Mark puts and end to this by reworking all the atomic64
     implementations to use 's64' as the base type for atomic64_t, and
     to ensure that this type is consistently used for parameters and
     return values in the API, avoiding further problems in this area.

   - A large set of small improvements to lockdep by Yuyang Du: type
     cleanups, output cleanups, function return type and othr cleanups
     all around the place.

   - A set of percpu ops cleanups and fixes by Peter Zijlstra.

   - Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  locking/lockdep: increase size of counters for lockdep statistics
  locking/atomics: Use sed(1) instead of non-standard head(1) option
  locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
  x86/jump_label: Make tp_vec_nr static
  x86/percpu: Optimize raw_cpu_xchg()
  x86/percpu, sched/fair: Avoid local_clock()
  x86/percpu, x86/irq: Relax {set,get}_irq_regs()
  x86/percpu: Relax smp_processor_id()
  x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
  locking/rwsem: Guard against making count negative
  locking/rwsem: Adaptive disabling of reader optimistic spinning
  locking/rwsem: Enable time-based spinning on reader-owned rwsem
  locking/rwsem: Make rwsem->owner an atomic_long_t
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Clarify usage of owner's nonspinaable bit
  locking/rwsem: Wake up almost all readers in wait queue
  locking/rwsem: More optimal RT task handling of null owner
  locking/rwsem: Always release wait_lock before waking up tasks
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Make rwsem_spin_on_owner() return owner state
  ...
2019-07-08 16:12:03 -07:00
Saeed Mahameed
e08a976a16 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:

1) Add the required HW definitions and structures for upcoming TLS
   support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
   management on ECPF embedded CPU NIC:
   5.1) Consolidate querying eswitch number of VFs
   5.2) Register event handler at the correct E-Switch init stage
   5.3) Setup PF's inline mode and vlan pop when the ECPF is the
        E-Swtich manager ( the host PF is basically a VF ).
   5.4) Handle Vport UC address changes in switchdev mode.

6) Cleanup the rep and netdev reference when unloading IB rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

i# All conflicts fixed but you are still merging.
2019-07-04 16:42:59 -04:00
Parav Pandit
2752b82316 net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.

Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03 12:50:42 -07:00
Yishai Hadas
4e0e2ea188 net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 21:00:20 +03:00
Yishai Hadas
38164b7719 net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:59:32 +03:00
Yishai Hadas
b9a7ba5562 net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.

As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03 20:55:45 +03:00
Thomas Gleixner
3419240495 Merge branch 'timers/vdso' into timers/core
so the hyper-v clocksource update can be applied.
2019-07-03 10:50:21 +02:00
Bodong Wang
f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
b8ca123860 RDMA/mlx5: Cleanup rep when doing unload
When an IB rep is loaded, netdev for the same vport is saved for later
reference. However, it's not cleaned up when doing unload. For ECPF,
kernel crashes when driver is referring to the already removed netdev.

Following steps lead to a shown call trace:
1. Create n VFs from host PF
2. Distroy the VFs
3. Run "rdma link" from ARM

Call trace:
  mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib]
  mlx5_query_port_roce+0x268/0x558 [mlx5_ib]
  mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib]
  ib_query_port+0x9c/0xfc [ib_core]
  fill_port_info+0x74/0x28c [ib_core]
  nldev_port_get_doit+0x1a8/0x1e8 [ib_core]
  rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core]
  rdma_nl_rcv+0xe8/0x144 [ib_core]
  netlink_unicast+0x184/0x214
  netlink_sendmsg+0x288/0x354
  sock_sendmsg+0x18/0x2c
  __sys_sendto+0xbc/0x138
  __arm64_sys_sendto+0x28/0x34
  el0_svc_common+0xb0/0x100
  el0_svc_handler+0x6c/0x84
  el0_svc+0x8/0xc

Cleanup the rep and netdev reference when unloading IB rep.

Fixes: 26628e2d58 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang
2f69e591e4 {IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.

Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Saeed Mahameed
4f5d1beadc Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:

1) E-Switch vport metadata support for source vport matching
2) Convert mkey_table to XArray
3) Shared IRQs and to use single IRQ for all async EQs

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28 16:03:54 -07:00
Jianbo Liu
669ff1e32f RDMA/mlx5: Add vport metadata matching for IB representors
If vport metadata matching is enabled in eswitch, the rule created
must be changed to match on the metadata, instead of source port.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:29 -07:00
Jianbo Liu
bb0ee7dcc4 net/mlx5: Add flow context for flow tag
Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26 12:01:28 -07:00
Matthew Wilcox
792c4e9d0b net/mlx5: Convert mkey_table to XArray
The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24 16:44:40 -07:00
Ingo Molnar
d2abae71eb Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into sched/core, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:19:53 +02:00
David S. Miller
92ad6325cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor SPDX change conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22 08:59:24 -04:00
Jason A. Donenfeld
9285ec4c8b timekeeping: Use proper clock specifier names in functions
This makes boot uniformly boottime and tai uniformly clocktai, to
address the remaining oversights.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
2019-06-22 12:11:27 +02:00
Gal Pressman
7a5834e456 RDMA/efa: Handle mmap insertions overflow
When inserting a new mmap entry to the xarray we should check for
'mmap_page' overflow as it is limited to 32 bits.

Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 16:27:24 -04:00
Denis Kirjanov
64d701c608 ipoib: correcly show a VF hardware address
in the case of IPoIB with SRIOV enabled hardware
ip link show command incorrecly prints
0 instead of a VF hardware address.

Before:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state disable,
trust off, query_rss off
...
After:
11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 256
    link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    vf 0     link/infiniband
80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
checking off, link-state disable, trust off, query_rss off

v1->v2: just copy an address without modifing ifla_vf_mac
v2->v3: update the changelog

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18 10:41:27 -07:00
David S. Miller
13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
Gal Pressman
529254340c RDMA/efa: Fix success return value in case of error
Existing code would mistakenly return success in case of error instead
of a proper return value.

Fixes: e9c6c53730 ("RDMA/efa: Add common command handlers")
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:35:21 -04:00
Mike Marciniszyn
942a899335 IB/hfi1: Handle port down properly in pio
The call to sc_buffer_alloc currently returns NULL (no buffer) or
a buffer descriptor.

There is a third case when the port is down.  Currently that
returns NULL and this prevents the caller from properly handling the
sc_buffer_alloc() failure.  A verbs code link test after the call is
racy so the indication needs to come from the state check inside the allocation
routine to be valid.

Fix by encoding the ECOMM failure like SDMA.   IS_ERR_OR_NULL() tests
are added at all call sites.  For verbs send, this needs to treat any
error by returning a completion without any MMIO copy.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
099a884ba4 IB/hfi1: Handle wakeup of orphaned QPs for pio
Once a send context is taken down due to a link failure, any QPs waiting
for pio credits will stay on the waitlist indefinitely.

Fix by wakeing up all QPs linked to piowait list.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
f972775b1c IB/hfi1: Wakeup QPs orphaned on wait list after flush
Once an SDMA engine is taken down due to a link failure, any waiting QPs
that do not have outstanding descriptors in the ring will stay
on the dmawait list as long as the port is down.

Since there is no timer running, they will stay there for a long time.

The fix is to wake up all iowaits linked to dmawait. The send engine
will build and post packets that get flushed back.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
4bb02e9572 IB/hfi1: Use aborts to trigger RC throttling
SDMA and pio flushes will cause a lot of packets to be transmitted
after a link has gone down, using a lot of CPU to retransmit
packets.

Fix for RC QPs by recognizing the flush status and:
- Forcing a timer start
- Putting the QP into a "send one" mode

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
9755f72496 IB/hfi1: Create inline to get extended headers
This paves the way for another patch that reacts to a
flush sdma completion for RC.

Fixes: 81cd3891f0 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
3230f4a8d4 IB/hfi1: Silence txreq allocation warnings
The following warning can happen when a memory shortage
occurs during txreq allocation:

[10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[10220.939247]   cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0
[10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1]
[10220.939261]   node 0: slabs: 1026568, objs: 43115856, free: 0
[10220.939262] Call Trace:
[10220.939262]   node 1: slabs: 820872, objs: 34476624, free: 0
[10220.939263]  dump_stack+0x5a/0x73
[10220.939265]  warn_alloc+0x103/0x190
[10220.939267]  ? wake_all_kswapds+0x54/0x8b
[10220.939268]  __alloc_pages_slowpath+0x86c/0xa2e
[10220.939270]  ? __alloc_pages_nodemask+0x2fe/0x320
[10220.939271]  __alloc_pages_nodemask+0x2fe/0x320
[10220.939273]  new_slab+0x475/0x550
[10220.939275]  ___slab_alloc+0x36c/0x520
[10220.939287]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939299]  ? __get_txreq+0x54/0x160 [hfi1]
[10220.939310]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939312]  __slab_alloc+0x40/0x61
[10220.939323]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939325]  kmem_cache_alloc+0x181/0x1b0
[10220.939336]  hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939348]  ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1]
[10220.939359]  ? find_prev_entry+0xb0/0xb0 [hfi1]
[10220.939371]  hfi1_do_send+0x1d9/0x3f0 [hfi1]
[10220.939372]  process_one_work+0x171/0x380
[10220.939374]  worker_thread+0x49/0x3f0
[10220.939375]  kthread+0xf8/0x130
[10220.939377]  ? max_active_store+0x80/0x80
[10220.939378]  ? kthread_bind+0x10/0x10
[10220.939379]  ret_from_fork+0x35/0x40
[10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)

The shortage is handled properly so the message isn't needed. Silence by
adding the no warn option to the slab allocation.

Fixes: 45842abbb2 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
cf131a8196 IB/hfi1: Avoid hardlockup with flushlist_lock
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.

Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Ingo Molnar
23da766ab1 Linux 5.2-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Gj1MeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGctkH/0At3+SQPY2JJSy8
 i6+TDeytFx9OggeGLPHChRfehkAlvMb/kd34QHnuEvDqUuCAMU6HZQJFKoK9mvFI
 sDJVayPGDSqpm+iv8qLpMBPShiCXYVnGZeVfOdv36jUswL0k6wHV1pz4avFkDeZa
 1F4pmI6O2XRkNTYQawbUaFkAngWUCBG9ECLnHJnuIY6ohShBvjI4+E2JUaht+8gO
 M2h2b9ieddWmjxV3LTKgsK1v+347RljxdZTWnJ62SCDSEVZvsgSA9W2wnebVhBkJ
 drSmrFLxNiM+W45mkbUFmQixRSmjv++oRR096fxAnodBxMw0TDxE1RiMQWE6rVvG
 N6MC6xA=
 =+B0P
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc5' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17 12:12:27 +02:00
Nikolay Borisov
9ffbe8ac05 locking/lockdep: Rename lockdep_assert_held_exclusive() -> lockdep_assert_held_write()
All callers of lockdep_assert_held_exclusive() use it to verify the
correct locking state of either a semaphore (ldisc_sem in tty,
mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by
apparmor. Thus it makes sense to rename _exclusive to _write since
that's the semantics callers care. Additionally there is already
lockdep_assert_held_read(), which this new naming is more consistent with.

No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17 12:09:24 +02:00
Janne Karhunen
42df744c41 LSM: switch to blocking policy update notifiers
Atomic policy updaters are not very useful as they cannot
usually perform the policy updates on their own. Since it
seems that there is no strict need for the atomicity,
switch to the blocking variant. While doing so, rename
the functions accordingly.

Signed-off-by: Janne Karhunen <janne.karhunen@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-06-14 09:02:42 -04:00
Yuval Avnery
1f8a7bee27 net/mlx5: Add EQ enable/disable API
Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.

To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Ariel Levkovich
81bfa20603 net/mlx5: Use a single IRQ for all async EQs
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.

The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
24163189da net/mlx5: Separate IRQ request/free from EQ life cycle
Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Yuval Avnery
ca390799c2 net/mlx5: Change interrupt handler to call chain notifier
Multiple EQs may share the same IRQ in subsequent patches.

Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.

The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.

This patch is the first step in the separation of IRQ and EQ logic.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 10:59:49 -07:00
Mike Marciniszyn
cc78076af1 IB/hfi1: Correct tid qp rcd to match verbs context
The qp priv rcd pointer doesn't match the context being used for verbs
causing issues when 9B and kdeth packets are processed by different
receive contexts and hence different CPUs.

When running on different CPUs the following panic can occur:

 WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
 list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230
 CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.2.3.el7_lustre.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b
  [<ffffffffb74916d8>] __warn+0xd8/0x100
  [<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffffb7768671>] __list_del_entry+0xa1/0xd0
  [<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1]
  [<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1]
  [<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1]
  [<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0
  [<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80
  [<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60
  [<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150
  [<ffffffffb742d504>] handle_irq+0xe4/0x1a0
  [<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0
  [<ffffffffb7b16362>] common_interrupt+0x162/0x162
  <EOI>  [<ffffffffb775a326>] ? memcpy+0x6/0x110
  [<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs]
  [<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs]
  [<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs]
  [<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs]
  [<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs]
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs]
  [<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl]
  [<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl]
  [<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs]
  [<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl]
  [<ffffffffb74cee80>] ? wake_up_state+0x20/0x20
  [<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
  [<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl]
  [<ffffffffb74bae31>] kthread+0xd1/0xe0
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
  [<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40

Fix by reading the map entry in the same manner as the hardware so that
the kdeth and verbs contexts match.

Cc: <stable@vger.kernel.org>
Fixes: 5190f052a3 ("IB/hfi1: Allow the driver to initialize QP priv struct")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Mike Marciniszyn
da9de5f852 IB/hfi1: Close PSM sdma_progress sleep window
The call to sdma_progress() is called outside the wait lock.

In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle.  If that happens, there will be no
more sdma interrupts to cause the wakeup and the user_sdma xmit will hang.

Fix by moving the lock to enclose the sdma_progress() call.

Also, delete busycount. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")

Cc: <stable@vger.kernel.org>
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Kaike Wan
5f90677ed3 IB/hfi1: Validate fault injection opcode user input
The opcode range for fault injection from user should be validated before
it is applied to the fault->opcodes[] bitmap to avoid out-of-bound
error.

Cc: <stable@vger.kernel.org>
Fixes: a74d5307ca ("IB/hfi1: Rework fault injection machinery")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:37 -03:00
Linus Torvalds
9331b6740f SPDX update for 5.2-rc4
Another round of SPDX header file fixes for 5.2-rc4
 
 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
 added, based on the text in the files.  We are slowly chipping away at
 the 700+ different ways people tried to write the license text.  All of
 these were reviewed on the spdx mailing list by a number of different
 people.
 
 We now have over 60% of the kernel files covered with SPDX tags:
 	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
 	Files checked:            64533
 	Files with SPDX:          40392
 	Files with errors:            0
 
 I think the majority of the "easy" fixups are now done, it's now the
 start of the longer-tail of crazy variants to wade through.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
 sxVEiFZo8ZU9v1IoRb1I
 =qU++
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
 "Another round of SPDX header file fixes for 5.2-rc4

  These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
  added, based on the text in the files. We are slowly chipping away at
  the 700+ different ways people tried to write the license text. All of
  these were reviewed on the spdx mailing list by a number of different
  people.

  We now have over 60% of the kernel files covered with SPDX tags:
	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
	Files checked:            64533
	Files with SPDX:          40392
	Files with errors:            0

  I think the majority of the "easy" fixups are now done, it's now the
  start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
  ...
2019-06-08 12:52:42 -07:00
David S. Miller
a6cdeeb16b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07 11:00:14 -07:00
Linus Torvalds
6e38335dcc 5.2 First rc pull request
The usual driver bug fixes and fixes for a couple of regressions introduced in
 5.2:
 
 - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to
   rename its internal sys files
 
 - Fix a memory leak in hns
 
 - Don't leak resources in efa on certain error unwinds
 
 - Don't panic in certain error unwinds in ib_register_device
 
 - Various small user visible bug fix patches for the hfi and efa drivers
 
 - Fix the 32 bit compilation break
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g
 mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3
 374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE
 GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH
 jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp
 XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg
 +Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ
 X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac
 Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX
 fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR
 SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/
 aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0=
 =RmxY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Things are looking pretty quiet here in RDMA, not too many bug fixes
  rolling in right now. The usual driver bug fixes and fixes for a
  couple of regressions introduced in 5.2:

   - Fix a race on bootup with RDMA device renaming and srp. SRP also
     needs to rename its internal sys files

   - Fix a memory leak in hns

   - Don't leak resources in efa on certain error unwinds

   - Don't panic in certain error unwinds in ib_register_device

   - Various small user visible bug fix patches for the hfi and efa
     drivers

   - Fix the 32 bit compilation break"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/efa: Remove MAYEXEC flag check from mmap flow
  mlx5: avoid 64-bit division
  IB/hfi1: Validate page aligned for a given virtual address
  IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
  IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
  IB/rdmavt: Fix alloc_qpn() WARN_ON()
  RDMA/core: Fix panic when port_data isn't initialized
  RDMA/uverbs: Pass udata on uverbs error unwind
  RDMA/core: Clear out the udata before error unwind
  RDMA/hns: Fix PD memory leak for internal allocation
  RDMA/srp: Rename SRP sysfs name after IB device rename trigger
2019-06-07 09:25:27 -07:00
Thomas Gleixner
2025cf9e19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:36:37 +02:00
David S. Miller
6c018b738a mlx5-updates-2019-05-31
This series provides some updates to mlx5 core and netdevice driver.
 
 1) use __netdev_tx_sent_queue() to improve performance under GSO workload
 
 2) Allow matching only enc_key_id/enc_dst_port for decapsulation action
 
 3) Geneve support:
 This patchset adds support for GENEVE tunnel encap/decap flows offload:
 encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams.
 The driver supports 6081 destination UDP port number, which is the
 default IANA-assigned port.
 
 Encap:
   ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is
   provided by the mlx5 driver to the outgoing packet.
 
 Decap:
   Geneve header is matched and the packet is decapsulated.
   Notes about decap flows with Geneve TLV Options:
    - Support offloading of 32-bit options data only
    - At any given time, only one combination of class/type parameters
      can be offloaded, but the same class/type combination can have
      many different flows offloaded with different 32-bit option data
    - Options with value of 0 can't be offloaded
 
 Managing Geneve TLV options:
   Matching (on receive) is done by ConnectX-5 flex parser.
   Geneve TLV options are managed using General Object of type
   “Geneve TLV Options”.
 
   When the first flow with a certain class/type values is requested
   to be offloaded, the driver creates a FW object with FW command
   (Geneve TLV Options general object) and starts counting the number
   of flows using this object.
 
   During this time, any request with a different class/type values
   will fail to be offloaded.
   Once the refcount reaches 0, the driver destroys the TLV options
   general object, and can now offload a flow with any class/type parameters.
 
   Geneve TLV Options object is added to core device.
   It is currently used to manage Geneve TLV options general
   object allocation in FW and its reference counting only.
 
   In the future it will also be used for managing geneve ports
   by registering callbacks for ndo_udp_tunnel_add/del.
 
 TC tunnel code refactoring:
   As a preparation for Geneve code, the TC tunnel code in mlx5
   was rearranged in a modular way, so that it would be easier
   to add future tunnels:
    - Defined tc tunnel object with the fields and callbacks that
      any tunnel must implement.
    - Define tc UDP tunnel object for UDP tunnels, such as VXLAN
    - Move each tunnel code (GRE, VXLAN) to its own separate file
    - Rewrite tc tunnel implementation in a general way – using
      only the objects and their callbacks.
 
 4) Termination tables:
 Actions in tables set with the termination flag are guaranteed to terminate
 the action list. Thus, potential looping functionality (e.g. haripin) can safely be
 executed without potential loops.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAlzxiMsACgkQSD+KveBX
 +j708ggAwjhVpazLCbo4kXfutln1eeQ6uImb2ivBDEIXjri3uK+GN5fWtqZVhg5v
 oRaTwdWAMZJFmEdvFKPOvAaqJwy3l3M1mXIjHYfQXpP8WYXYvteoq5AuSxqfEFcE
 wK127DRe2zcH75Q5Q8ObL1lMBVvYeu6xBnr3EQUaPFDF9hi4np+r5bJvhHwJzt7z
 lxdsGdxdTmqz3hw+rkp/Uuvx2Nniy5Tkm4zuNeQdoCtlYtqEs3dVFUpZqIfYgjdx
 hCZC1GEqKfLpdRU3qCW6HRaO2Yeok6a9QYbb70KUEeCVbwMXDnjohlz+61XJEd+M
 gp92vmf11tjSBruv56O8KfokFBIxUw==
 =oum3
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-05-31

This series provides some updates to mlx5 core and netdevice driver.

1) use __netdev_tx_sent_queue() to improve performance under GSO workload

2) Allow matching only enc_key_id/enc_dst_port for decapsulation action

3) Geneve support:
This patchset adds support for GENEVE tunnel encap/decap flows offload:
encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams.
The driver supports 6081 destination UDP port number, which is the
default IANA-assigned port.

Encap:
  ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is
  provided by the mlx5 driver to the outgoing packet.

Decap:
  Geneve header is matched and the packet is decapsulated.
  Notes about decap flows with Geneve TLV Options:
   - Support offloading of 32-bit options data only
   - At any given time, only one combination of class/type parameters
     can be offloaded, but the same class/type combination can have
     many different flows offloaded with different 32-bit option data
   - Options with value of 0 can't be offloaded

Managing Geneve TLV options:
  Matching (on receive) is done by ConnectX-5 flex parser.
  Geneve TLV options are managed using General Object of type
  “Geneve TLV Options”.

  When the first flow with a certain class/type values is requested
  to be offloaded, the driver creates a FW object with FW command
  (Geneve TLV Options general object) and starts counting the number
  of flows using this object.

  During this time, any request with a different class/type values
  will fail to be offloaded.
  Once the refcount reaches 0, the driver destroys the TLV options
  general object, and can now offload a flow with any class/type parameters.

  Geneve TLV Options object is added to core device.
  It is currently used to manage Geneve TLV options general
  object allocation in FW and its reference counting only.

  In the future it will also be used for managing geneve ports
  by registering callbacks for ndo_udp_tunnel_add/del.

TC tunnel code refactoring:
  As a preparation for Geneve code, the TC tunnel code in mlx5
  was rearranged in a modular way, so that it would be easier
  to add future tunnels:
   - Defined tc tunnel object with the fields and callbacks that
     any tunnel must implement.
   - Define tc UDP tunnel object for UDP tunnels, such as VXLAN
   - Move each tunnel code (GRE, VXLAN) to its own separate file
   - Rewrite tc tunnel implementation in a general way – using
     only the objects and their callbacks.

4) Termination tables:
Actions in tables set with the termination flag are guaranteed to terminate
the action list. Thus, potential looping functionality (e.g. haripin) can safely be
executed without potential loops.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 13:42:56 -07:00
Sebastian Andrzej Siewior
3bd3706251 sched/core: Provide a pointer to the valid CPU mask
In commit:

  4b53a3412d ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")

the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.

As an alternative implementation Ingo suggested to use:

	struct task_struct {
		const cpumask_t		*cpus_ptr;
		cpumask_t		cpus_mask;
        };
with
	t->cpus_ptr = &t->cpus_mask;

In -RT we then can switch the cpus_ptr to:

	t->cpus_ptr = &cpumask_of(task_cpu(p));

in a migration disabled region. The rules are simple:

 - Code that 'uses' ->cpus_allowed would use the pointer.
 - Code that 'modifies' ->cpus_allowed would use the direct mask.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-03 11:49:37 +02:00
Florian Westphal
2638eb8b50 net: ipv4: provide __rcu annotation for ifa_list
ifa_list is protected by rcu, yet code doesn't reflect this.

Add the __rcu annotations and fix up all places that are now reported by
sparse.

I've done this in the same commit to not add intermediate patches that
result in new warnings.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:08:36 -07:00
Florian Westphal
cb8f1478ce drivers: use in_dev_for_each_ifa_rtnl/rcu
Like previous patches, use the new iterator macros to avoid sparse
warnings once proper __rcu annotations are added.

Compile tested only.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:06:26 -07:00
Saeed Mahameed
7fe4d43ecc Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This series provides some low level updates for mlx5 driver needed for
both rdma and netdev trees.

1) Termination flow steering table bits and hardware definitions.

2) Introduce the core dump HW access registers definitions.

3) Refactor and cleans-up VF representors functions handlers.

4) Renames host_params bits to function_changed bits and add the
support for eswitch functions change event in the eswitch general case.
(for both legacy and switchdev modes).

5) Potential error pointer dereference in error handling

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-31 13:04:06 -07:00