Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The schedutil governor reduces frequencies too fast in some
situations which cases undesirable performance drops to
appear.
To address that issue, make schedutil reduce the frequency slower by
setting it to the average of the value chosen during the previous
iteration of governor computations and the new one coming from its
frequency selection formula.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963
Reported-by: John <john.ettedgui@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.
Signed-off-by: Box, David E <david.e.box@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.
First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf. It is not necessary to check
those conditions every time the numbers in question are used.
Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.
For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate. The rest of the code simply assumes one CPU per
policy, including the initialization code.
Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.
For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.
That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.
For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid using extra function pointers during P-state selection by
dropping the get_target_pstate member from struct pstate_funcs,
adding a new update_util callback to it (to be registered with
the CPU scheduler as the utilization update callback in the active
mode) and reworking the utilization update callback routines to
invoke specific P-state selection functions directly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that some overhead in the utilization update callbacks
registered by intel_pstate in the active mode can be avoided if
those callbacks are tailored to specific configurations of the
driver. For example, the utilization update callback for the HWP
enabled case only needs to update the average CPU performance
periodically whereas the utilization update callback for the
PID-based algorithm does not need to take IO-wait boosting into
account and so on.
With that in mind, define three utilization update callbacks for
three different use cases: HWP enabled, the CPU load "powersave"
P-state selection algorithm and the PID-based "powersave" P-state
selection algorithm and modify the driver initialization to
choose the callback matching its current configuration.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One of the checks in intel_pstate_update_status() implicitly relies
on the information that there are only two struct cpufreq_driver
objects available, but it is better to do it directly against the
value it really is about (to make the code easier to follow if
nothing else).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver_registered variable in intel_pstate is used for checking
whether or not the driver has been registered, but intel_pstate_driver
can be used for that too (with the rule that the driver is not
registered as long as it is NULL).
That is a bit more straightforward and the code may be simplified
a bit this way, so modify the driver accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PID controller parameters only need to be initialized if the
get_target_pstate_use_performance() P-state selection routine
is going to be used. It is not necessary to initialize them
otherwise, so don't do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the HWP enabled case pid_params.sample_rate_ns only needs to be
updated once, because it is global, so do that when setting hwp_active
instead of doing it during the initialization of every CPU.
Moreover, pid_params.sample_rate_ms is never used if HWP is enabled,
so do not update it at all then.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
intel_pstate_busy_pid_reset() is the only caller of pid_reset(),
pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set(). Moreover,
it passes constants as two parameters of pid_reset() and all of
the other routines above essentially contain the same code, so
fold all of them into the caller and drop unnecessary computations.
Introduce percent_fp() for converting integer values in percent
to fixed-point fractions and use it in the above code cleanup.
Finally, rename intel_pstate_busy_pid_reset() to
intel_pstate_pid_reset() as it also is used for the
initialization of PID parameters for every CPU and the
meaning of the "busy" part of the name is not particularly
clear.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is only one caller of intel_pstate_reset_all_pid(), which is
pid_param_set() used in the debugfs interface only, and having that
code split does not make it particularly convenient to follow.
For this reason, move the body of intel_pstate_reset_all_pid() into
its caller and drop that function.
Also change the loop from for_each_online_cpu() (which is obviously
racy with respect to CPU offline/online) to for_each_possible_cpu(),
so that all PID parameters are reset for all CPUs regardless of their
online/offline status (to prevent, for example, a previously offline
CPU from going online with a stale set of PID parameters).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that both the existing struct cpu_defaults instances in which
PID parameters are actually initialized use the same values of those
parameters, so it is not really necessary to copy them over to
pid_params dynamically.
Instead, initialize pid_params statically with those values and
drop the unused pid_policy member from struct cpu_defaults along
with copy_pid_params() used for initializing it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The P-state selection algorithm used by intel_pstate for Atom
processors is not based on the PID controller and the initialization
of PID parametrs for those processors is pointless and confusing, so
drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After recent changes the purpose of struct perf_limits is not
particularly clear any more and the code may be made somewhat
easier to follow by eliminating it, so go for that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
set policy->cpuinfo.max_freq depending on the turbo status, but the
updates made by them are discarded by the core, because the policy
object passed to them by the core is temporary and cpuinfo.max_freq
from that object is not copied to the final policy object in
cpufreq_set_policy().
However, cpufreq_set_policy() passes the temporary policy object
to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
actually sees the policy->cpuinfo.max_freq value updated by
intel_pstate_verify_policy() and not the final one. It also
updates policy->max sometimes which basically has no effect after
it returns, because the core discards that update.
To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
entirely and check the maximum frequency explicitly in
intel_pstate_update_perf_limits() instead of relying on the
transiently updated policy->cpuinfo.max_freq value.
Moreover, move the max->policy adjustment carried out in
intel_pstate_set_policy() to a separate function and call that
function from the ->verify driver callbacks to ensure that it will
actually be effective.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The coordination of P-state limits used by intel_pstate in the active
mode (ie. by default) is problematic, because it synchronizes all of
the limits (ie. the global ones and the per-policy ones) so as to use
one common pair of P-state limits (min and max) across all CPUs in
the system. The drawbacks of that are as follows:
- If P-states are coordinated in hardware, it is not necessary
to coordinate them in software on top of that, so in that case
all of the above activity is in vain.
- If P-states are not coordinated in hardware, then the processor
is actually capable of setting different P-states for different
CPUs and coordinating them at the software level simply doesn't
allow that capability to be utilized.
- The coordination works in such a way that setting a per-policy
limit (eg. scaling_max_freq) for one CPU causes the common
effective limit to change (and it will affect all of the other
CPUs too), but subsequent reads from the corresponding sysfs
attributes for the other CPUs will return stale values (which
is confusing).
- Reads from the global P-state limit attributes, min_perf_pct and
max_perf_pct, return the effective common values and not the last
values set through these attributes. However, the last values
set through these attributes become hard limits that cannot be
exceeded by writes to scaling_min_freq and scaling_max_freq,
respectively, and they are not exposed, so essentially users
have to remember what they are.
All of that is painful enough to warrant a change of the management
of P-state limits in the active mode.
To that end, redesign the active mode P-state limits management in
intel_pstate in accordance with the following rules:
(1) All CPUs are affected by the global limits (that is, none of
them can be requested to run faster than the global max and
none of them can be requested to run slower than the global
min).
(2) Each individual CPU is affected by its own per-policy limits
(that is, it cannot be requested to run faster than its own
per-policy max and it cannot be requested to run slower than
its own per-policy min).
(3) The global and per-policy limits can be set independently.
Also, the global maximum and minimum P-state limits will be always
expressed as percentages of the maximum supported turbo P-state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Extend the set of systems for which intel_pstate will use the
"powersave" P-state selection algorithm based on CPU load in the
active mode by systems with ACPI preferred profile set to "tablet",
"appliance PC", "desktop", or "workstation" (ie. everything with a
specified preferred profile that is not a "server").
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, some processors supporting HWP are only supported by
intel_pstate if HWP is actually going to be used and not supported
otherwise which is confusing.
Specifically, they are not supported if "intel_pstate=no_hwp" is
passed to the kernel in the command line or if the driver is started
in the passive mode ("intel_pstate=passive").
There is no real reason for that, because everything about those
processor is known anyway and the driver can work with them in all
modes, so make that happen, but use the load-based P-state selection
algorithm for the active mode "powersave" policy with them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
sugov_update_commit() calls trace_cpu_frequency() to record the
current CPU frequency if it has not changed in the fast switch case
to prevent utilities from getting confused (they may report that the
CPU is idle if the frequency has not been recorded for too long, for
example).
However, that may cause the tracepoint to be triggered quite often
for no real reason (if the frequency doesn't change, we will not
modify the last update time stamp and governor computations may
run again shortly when that happens), so don't do that (arguably, it
is done to work around a utilities bug anyway).
That allows code duplication in sugov_update_commit() to be reduced
somewhat too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.
That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register. Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again. That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.
That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task. If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away. That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.
This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway. On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.
On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.
To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
The policy->cpuinfo.max_freq and policy->max updates in
intel_cpufreq_turbo_update() are excessive as they are done for no
good reason and may lead to problems in principle, so they should be
dropped. However, after dropping them intel_cpufreq_turbo_update()
becomes almost entirely pointless, because the check made by it is
made again down the road in intel_pstate_prepare_request(). The
only thing in it that still needs to be done is the call to
update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
and make its callers invoke update_turbo_state() directly instead of
it.
In addition to that, fix intel_cpufreq_verify_policy() so that it
checks global.no_turbo in addition to global.turbo_disabled when
updating policy->cpuinfo.max_freq to make it consistent with
intel_pstate_verify_policy().
Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
sugov_start() only initializes struct sugov_cpu per-CPU structures
for shared policies, but it should do that for single-CPU policies too.
That in particular makes the IO-wait boost mechanism work in the
cases when cpufreq policies correspond to individual CPUs.
Fixes: 21ca6d2c52 (cpufreq: schedutil: Add iowait boosting)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
This BUG_ON() triggered for me once at shutdown, and I don't see a
reason for the check. The code correctly checks whether the swap slot
cache is usable or not, so an uninitialized swap slot cache is not
actually problematic afaik.
I've temporarily just switched the BUG_ON() to a WARN_ON_ONCE(), since
I'm not sure why that seemingly pointless check was there. I suspect
the real fix is to just remove it entirely, but for now we'll warn about
it but not bring the machine down.
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Wire up statx() syscall
- Don't print a warning on memory hotplug when HPT resizing isn't available
Thanks to:
David Gibson, Chandan Rajendra.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYzjnLAAoJEFHr6jzI4aWAwcAQAIeOYvCaoxQg6IEFoWJVc+hv
YfMy86TotkWFojQDV6OUddlz7S14AypjniegVq9aWOPIVwGqmDfp+W8qRfhUh3ab
S+uorqmXzAFhIUpzVpVpX/nnK6C3eFQdaKCLOxM0Ev413AWUMu70cPmXlCR+x0Kx
GxeV+nfIfBdyIL4yhK/aDHSItfUwNTmTPSyaUj17/cwLu7DwMjcwKWjrCYCzgBZW
1hQeWo+yrfvJ1U4yMEGdnDfuTPnWQZsHMw3qSuPzPCnVMgV6YS3HC7/ZEUvBSSBJ
Bwr6vVEsniLkHDgyFu3665y4abfvqw2iojXKuTpUQMp40T3RCZlT1FQoMvIoJQGC
FSMUsqnEudjliURi92zq4ImSbPbezB2bG3EsK1jKOOQ9gGbQa2qjXc2CSJiCtZwE
zsUCcwRFo8Pl1D/KYMTl52nQG3oOMQV/2ceX73HxajsdShXcyJxFEpMPv6aEKi1E
YT5i2KUXq3sfFPNWe/4gtfOYX9j+tSqi+CRvnwDNZG+a3XLUk/VRptCeIyntH965
8GYs28CxLz1qljBeAA7czo+1gpNhl1h6FNl1nhqyWRM6jdcLucMeEml0RYCYPmnk
+jLO6CkYYpIraM0mRqAolM7fYAtm6dOphaeeezCJIMvD/NTdk+pH86JoA64GQZvE
v1HXCc4lYyUFlvASrCNl
=TfiW
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc fixes from Michael Ellerman:
"A couple of minor powerpc fixes for 4.11:
- wire up statx() syscall
- don't print a warning on memory hotplug when HPT resizing isn't
available
Thanks to: David Gibson, Chandan Rajendra"
* tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Don't give a warning when HPT resizing isn't available
powerpc: Wire up statx() syscall
Pull parisc fixes from Helge Deller:
- Mikulas Patocka added support for R_PARISC_SECREL32 relocations in
modules with CONFIG_MODVERSIONS.
- Dave Anglin optimized the cache flushing for vmap ranges.
- Arvind Yadav provided a fix for a potential NULL pointer dereference
in the parisc perf code (and some code cleanups).
- I wired up the new statx system call, fixed some compiler warnings
with the access_ok() macro and fixed shutdown code to really halt a
system at shutdown instead of crashing & rebooting.
* 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix system shutdown halt
parisc: perf: Fix potential NULL pointer dereference
parisc: Avoid compiler warnings with access_ok()
parisc: Wire up statx system call
parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_range
parisc: support R_PARISC_SECREL32 relocation in modules
Pull SCSI target fixes from Nicholas Bellinger:
"The bulk of the changes are in qla2xxx target driver code to address
various issues found during Cavium/QLogic's internal testing (stable
CC's included), along with a few other stability and smaller
miscellaneous improvements.
There are also a couple of different patch sets from Mike Christie,
which have been a result of his work to use target-core ALUA logic
together with tcm-user backend driver.
Finally, a patch to address some long standing issues with
pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices,
which will make folks using physical (or virtual) magnetic tape happy"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits)
qla2xxx: Update driver version to 9.00.00.00-k
qla2xxx: Fix delayed response to command for loop mode/direct connect.
qla2xxx: Change scsi host lookup method.
qla2xxx: Add DebugFS node to display Port Database
qla2xxx: Use IOCB interface to submit non-critical MBX.
qla2xxx: Add async new target notification
qla2xxx: Export DIF stats via debugfs
qla2xxx: Improve T10-DIF/PI handling in driver.
qla2xxx: Allow relogin to proceed if remote login did not finish
qla2xxx: Fix sess_lock & hardware_lock lock order problem.
qla2xxx: Fix inadequate lock protection for ABTS.
qla2xxx: Fix request queue corruption.
qla2xxx: Fix memory leak for abts processing
qla2xxx: Allow vref count to timeout on vport delete.
tcmu: Convert cmd_time_out into backend device attribute
tcmu: make cmd timeout configurable
tcmu: add helper to check if dev was configured
target: fix race during implicit transition work flushes
target: allow userspace to set state to transitioning
target: fix ALUA transition timeout handling
...
Pull device-dax fixes from Dan Williams:
"The device-dax driver was not being careful to handle falling back to
smaller fault-granularity sizes.
The driver already fails fault attempts that are smaller than the
device's alignment, but it also needs to handle the cases where a
larger page mapping could be established. For simplicity of the
immediate fix the implementation just signals VM_FAULT_FALLBACK until
fault-size == device-alignment.
One fix is for -stable to address pmd-to-pte fallback from the
original implementation, another fix is for the new (introduced in
4.11-rc1) pud-to-pmd regression, and a typo fix comes along for the
ride.
These have received a build success notification from the kbuild
robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fix debug output typo
device-dax: fix pud fault fallback handling
device-dax: fix pmd/pte fault fallback handling
Current driver wait for FW to be in the ready state before
processing in-coming commands. For Arbitrated Loop or
Point-to- Point (not switch), FW Ready state can take a while.
FW will transition to ready state after all Nports have been
logged in. In the mean time, certain initiators have completed
the login and starts IO. Driver needs to start processing all
queues if FW is already started.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
For target mode, when new scsi command arrive, driver first performs
a look up of the SCSI Host. The current look up method is based on
the ALPA portion of the NPort ID. For Cisco switch, the ALPA can
not be used as the index. Instead, the new search method is based
on the full value of the Nport_ID via btree lib.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The Mailbox interface is currently over subscribed. We like
to reserve the Mailbox interface for the chip managment and
link initialization. Any non essential Mailbox command will
be routed through the IOCB interface. The IOCB interface is
able to absorb more commands.
Following commands are being routed through IOCB interface
- Get ID List (007Ch)
- Get Port DB (0064h)
- Get Link Priv Stats (006Dh)
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
If the remote port have started the login process, then the
PLOGI and PRLI should be back to back. Driver will allow
the remote port to complete the process. For the case where
the remote port decide to back off from sending PRLI, this
local port sets an expiration timer for the PRLI. Once the
expiration time passes, the relogin retry logic is allowed
to go through and perform login with the remote port.
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The main lock that needs to be held for CMD or TMR submission
to upper layer is the sess_lock. The sess_lock is used to
serialize cmd submission and session deletion. The addition
of hardware_lock being held is not necessary. This patch removes
hardware_lock dependency from CMD/TMR submission.
Use hardware_lock only for error response in this case.
Path1
CPU0 CPU1
---- ----
lock(&(&ha->tgt.sess_lock)->rlock);
lock(&(&ha->hardware_lock)->rlock);
lock(&(&ha->tgt.sess_lock)->rlock);
lock(&(&ha->hardware_lock)->rlock);
Path2/deadlock
*** DEADLOCK ***
Call Trace:
dump_stack+0x85/0xc2
print_circular_bug+0x1e3/0x250
__lock_acquire+0x1425/0x1620
lock_acquire+0xbf/0x210
_raw_spin_lock_irqsave+0x53/0x70
qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
process_one_work+0x1f4/0x6e0
Cc: <stable@vger.kernel.org>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reported-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Normally, ABTS is sent to Target Core as Task MGMT command.
In the case of error, qla2xxx needs to send response, hardware_lock
is required to prevent request queue corruption.
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
When FW notify driver or driver detects low FW resource,
driver tries to send out Busy SCSI Status to tell Initiator
side to back off. During the send process, the lock was not held.
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>