Commit Graph

8 Commits

Author SHA1 Message Date
Suganath Prabu
320e77acb3 scsi: mpt3sas: Irq poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issue from fields if system has greater (more than
96) logical cpu count.  SAS3.0 controller (Invader series) supports at max
96 msix vector and SAS3.5 product (Ventura) supports at max 128 msix
vectors.

This may be a generic issue (if PCI device supports completion on multiple
reply queues).  Let me explain it w.r.t to mpt3sas supported h/w just to
simplify the problem and possible changes to handle such issues. IT HBA
(mpt3sas) supports multiple reply queues in completion path. Driver creates
MSI-x vectors for controller as "min of (FW supported Reply queue, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.

Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA.  If the CPU A is continuously pumping the IOs then always CPU B (which
is executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.

Mpt3sas driver will exit ISR handler if it finds unused reply descriptor in
the reply descriptor queue. Since CPU A will be continuously sending the
IOs, CPU B may always see a valid reply descriptor (posted by HBA Firmware
after processing the IO) in the reply descriptor queue. In worst case,
driver will not quit from this loop in the ISR handler. Eventually, CPU
lockup will be detected by watchdog.

Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalance as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU.  If
irqbalance is using "exact" policy, interrupt will be delivered to
submitter CPU.

If CPU counts to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.

Exposure of soft/hard lockup if CPU count is more than MSI-x supported by
device.

If CPUs count to MSI-x vectors count ratio is not 1:1, (Other way, if CPU
counts to MSI-x vector count ratio is something like X:1, where X > 1) then
'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between CPU to
MSI-x vector instead one MSI-x interrupt (or reply descriptor queue) is
shared with group/set of CPUs and there is a possibility of having a loop
in the IO path within that CPU group and may observe lockups.

For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-x vectors enabled on
the HBA is two, then CPUs count to MSI-x vector count ratio as 4:1.  e.g.
MSIx vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-x vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.

numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3                 --> MSI-x 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7                 -->MSI-x 1
node 1 size: 65536 MB
node 1 free: 63176 MB

Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs.  Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSIx vector 0 for all the IOs. Eventually, CPU 0 IO
submission percentage will be decreasing and ISR processing percentage will
be increasing as it is more busy with processing the interrupts.  Gradually
IO submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100 percentage as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor
queue. Eventually, we will observe the hard lockup here.

Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.

Solution: Use IRQ poll interface defined in " irq_poll.c".  mpt3sas driver
will execute ISR routine in Softirq context and it will always quit the
loop based on budget provided in IRQ poll interface.

In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.  Note -
Only one MSI-x vector is busy doing processing.

Irqstat output:

IRQs / 1 second(s)
IRQ#  TOTAL  NODE0   NODE1   NODE2   NODE3  NAME
  44    122871   122871   0       0       0  IR-PCI-MSI-edge mpt3sas0-msix0
  45        0              0           0       0       0  IR-PCI-MSI-edge mpt3sas0-msix1

We use this approach only if cpu count is more than FW supported MSI-x
vector

Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-18 17:16:43 -04:00
James Bottomley
3ddda3e4c8 mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility
The non-PCI builds of the O day test project are failing:

On Thu, 2015-12-03 at 05:02 +0800, kbuild test robot wrote:
> warning: (SCSI_MPT2SAS) selects SCSI_MPT3SAS which has unmet direct
> dependencies (SCSI_LOWLEVEL && PCI && SCSI)

The problem is that select and depend don't interact because Kconfig doesn't
have a SAT solver, so depend picks up dependencies and select does onward
selects, but select doesn't pick up dependencies.  To fix this, we need to add
the correct dependencies to the MPT2SAS option like this.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: b840c3627b
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2015-12-03 09:31:23 -08:00
Martin K. Petersen
b840c3627b mpt3sas: Add dummy Kconfig option for backwards compatibility
The mpt2sas driver was recently folded into mpt3sas to reduce code
duplication.

To avoid problems for people that only have CONFIG_SCSI_MPT2SAS in their
.config we introduce a dummy option that will select MPT3SAS if MPT2SAS
was previously enabled.

This is a temporary measure and we will deprecate this config option in
4.6.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Bottomley <James.Bottomley@hansenpartnership.com>
CC: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-30 20:36:08 -05:00
Sreekanth Reddy
c84b06a48c mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs
Modified the mpt3sas driver to have a single driver module which
supports both SAS 2.0 & SAS 3.0 HBA devices.

* Added SAS 2.0 HBA device IDs to the mpt3sas_pci_table pci table.

* Created two separate SCSI host templates for SAS2 and SAS3 HBAs so
  that, during the driver load time driver can use corresponding host
  template(based the pci device ID) while registering a scsi host
  adapter instance for that pci device.

* Registered two IOCTL devices, mpt2ctl is for SAS2 HBAs & mpt3ctl for
  SAS3 HBAs. Also updated the code to make sure that mpt2ctl device
  processes only those ioctl cmds issued for the SAS2 HBAs and mpt3ctl
  device processes only those ioctl cmds issued for the SAS3 HBAs.

* Added separate indexing for SAS2 and SAS3 HBAs.

* Replaced compile time check 'MPT2SAS_SCSI' to run time check
  'hba_mpi_version_belonged' whereever needed.

* Aliased this merged driver to mpt2sas using MODULE_ALIAS.

* Moved global varaible 'driver_name' to per adapter instance variable.

* Created two raid function template and used corresponding raid
  function templates based on the run time check
  'hba_mpi_version_belonged'.

* Moved mpt2sas_warpdrive.c file from mpt2sas to mpt3sas folder and
  renamed it as mpt3sas_warpdrive.c.

* Also renamed the functions in mpt3sas_warpdrive.c file to follow
  current driver function name convention.

* Updated the Makefile to build mpt3sas_warpdrive.o file for these
  WarpDrive-specific functions.

* Also in function mpt3sas_setup_direct_io(), used sector_div() API
  instead of division operator (which gives compilation errors on 32 bit
  machines).

* Removed mpt2sas files, mpt2sas directory & mpt3sas_module.c file.

* Added module parameter 'hbas_to_enumerate' which permits using this
  merged driver as a legacy mpt2sas driver or as a legacy mpt3sas
  driver.

  Here are the available options for this module parameter:

   0 - Merged driver which enumerates both SAS 2.0 & SAS 3.0 HBAs
   1 - Acts as legacy mpt2sas driver, which enumerates only SAS 2.0 HBAs
   2 - Acts as legacy mpt3sas driver, which enumerates only SAS 3.0 HBAs

* Removed mpt2sas entries from SCSI's Kconfig and Makefile files.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-11 19:50:11 -05:00
Sreekanth Reddy
af0094115b mpt2sas, mpt3sas: Remove SCSI_MPTXSAS_LOGGING entry from Kconfig
Currently there is a logging level option provided for each of our
drivers in the kernel configuration utility. Users can enable this
option to get more verbose information. By default it is enabled.

Only when this option is enabled will the functions which display the
required information get compiled in.

As we are merging the both drivers we can no longer provide this
configuration option. Remove the SCSI_MPTXSAS_LOGGING entry from Kconfig
and unconditionally enable logging (by removing the #ifdef
CONFIG_SCSI_MPT3SAS_LOGGING preprocessor check conditions) so that all
functions which are defined to display more verbose information get
compiled in.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-11 18:31:14 -05:00
Sreekanth Reddy
a4ffce0d63 mpt3sas: Copyright in driver sources is updated for year the 2014.
Copyright in driver sources is updated for year the 2014.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-16 09:14:19 -07:00
Sreekanth Reddy
48e3b9855d [SCSI] mpt3sas: 2013 source code copyright
The Copyright String in all mpt3sas files are changed to 2012-2013.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-07-09 08:34:03 +01:00
Sreekanth Reddy
f92363d123 [SCSI] mpt3sas: add new driver supporting 12GB SAS
These driver files are initially, substantially similar to mpt2sas but,
because mpt2sas is going into maintenance mode and mp3sas will become heavily
developed, we elected to keep the code bases separate.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
Reviewed-by: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-12-01 10:09:17 +00:00