The port_scan work was scheduled to the work_queue provided by the
kernel. This resulted on SMP systems to a likely situation that more
than one scan_work were processed in parallel. This is not required
and openes the possibility of race conditions between the removal of
invalid ports and the enqueue of just scanned ports. This patch
synchronizes the scan_work tasks by scheduling them to adapter local
work_queue.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The flag ZFCP_STATUS_COMMON_REMOVE was used to indicate that a
resource is not ready to be used or about to be removed from the
system. This is now better done by an improved list handling
and therefore the additional indicator is not required anymore.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Replace the local reference counting by already available mechanisms
offered by kref. Where possible existing device structures were used,
including the same functionality.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The global config_lock was used to protect the configuration organized
in independent lists. It is not necessary to have a lock on driver
level for this purpose. This patch replaces the global config_lock
with a set of local list locks.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For ports, zfcp gets the DID from the FC nameserver and tries to open
the port. If the open succeeds, zfcp compares the WWPN from the
nameserver with the WWPN in the PLOGI payload. In case of a mismatch,
zfcp assumes that the DID of the port just changed and we opened the
wrong port. This means that zfcp has to forget the DID, lookup the DID
again and retry.
This error case had a problem that zfcp forgets the DID, but never
looks up a new one, stalling the ERP in this case. Fix this by
triggering the DID lookup and properly exit from the ERP. The DID
lookup will trigger a new ERP action.
Also ensure when trying to open the port again with the new DID, first
close the open port, even in the NOESC case.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
So far, zfcp allocated all resources required for FCP
adapters/subchannels when the device was discovered in the ccw_probe
callback. If there are lots of unused FCP subchannels attached to a
system, this is a waste of resources. To alleviate this, defer the
resource allocation to the first call to ccw_set_online. To avoid
disruptions during possible following calls to ccw_set_offline and
then ccw_set_online, keep the adapter resources until the device is
finally being removed via ccw_remove. While doing this, also manage
the zfcp erp thread together with all other adapter resources in
zfcp_adapter_enqueue/dequeue.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Switch the creation of the zfcp erp thread from the deprecated
kernel_thread API to the kthread API. This allows also the removal of
some flags in zfcp since the kthread API handles thread creation and
shutdown internally. To allow the usage of the kthread_stop function,
replace the erp ready semaphore with a waitqueue for waiting until erp
actions arrive on the ready queue.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Change the dbf data and functions to use the zfcp_dbf prefix
throughout the code. Also change the calls to dbf to use zfcp_dbf
instead of zfcp_adapter.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Don't let the erp wait for gid_pn requests to complete. Instead, queue
the gid_pn work, exit erp and let the finished gid_pn work trigger a
new port reopen.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The zfcp_adapter structure was growing over time to a size of almost
one memory page. To reduce the size of the data structure and to
seperate different layers, put all qdio related data in the new
zfcp_qdio data structure.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Remove the global driver work queue and replace it with a workqueue
local to the adapter. The usage of this workqueue makes this the
correct place for the structure. In addition multiple adapters won't
block each other due to the serialization of the queued work.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The combination wait_queue/wakeup in conjunction with the flag
ZFCP_STATUS_FSFREQ_COMPLETED to signal the completion of an fsfreq
was not race-safe and can be better solved by a completion.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
An adapter shutdown implicitly closes all open ports. Make sure to
mark all WKA ports as offline, not only the directory server. Also
make sure that no pending wka port work is running when the adapter is
being removed.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
In a LOWMEM condition an ERP notification would have been sent twice
causing an unpredictable behaviour of the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If an action fails, retry it until the erp count exceeds the
threshold. If there is something fundamentally wrong, the FSF layer
will trigger a more appropriate action depending on the FSF status
codes.
The followup for successful actions is a different followup than
retrying failed actions, so split the code two functions to make this
clear.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
After closing the port, we want it to be "not open" to consider the
action to be successful.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
zfcp_erp_notify uses the ZFCP_ERP_STATUS_* flags, so it is
ZFCP_STATUS_ERP_LOWMEM instead of ZFCP_ERP_NOMEM. Signalling
ZFCP_ERP_FAILED is not necessary, the missing d_id will show that the
nameserver did not return the d_id.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Provide the ability to do fibre channel requests from the userspace to
our zfcp driver. Patch builds upon extension to the fibre channel
tranport class by James Smart and Seokmann Ju. See here
http://marc.info/?l=linux-scsi&m=123808882309133&w=2
Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If the destination ID (D_ID) of a remote storage port changed, e.g.
re-plugged cable on the switch in a different switch port, the port
was never (re-)attached within Linux. This patch fixes the broken
mapping between the WWPN and the D_ID.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add comments where there is a deliberate fall through in switch/case
statements. This makes some code checkers happy and makes it clear
that there is no missing break statement.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ERP thread is performing a task before it is executing the
corresponding down on the semaphore. The response handler of the
just started exchange config should wait for the completion by
performing a down on this semaphore. Since this semaphore is still
positive from the ERP enqueue the handler won't wait and therefore
the exchange config will always fail leaving the adapter in error.
The problem can be solved by performing the down on the semaphore
before starting an ERP task. This is the logically correct order.
Only walk the ERP loop if there is a task to perform.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The nameserver port might be in state online when the adapter is
offlined. On adapter reactivation the nameserver port is not
re-opened due to the PORT_ONLINE status. This results in an
unsuccessful recovery. In forcing the nameserver port status
to offline on all adapter offline events this issue is prevented.
Waiting for the reference count to drop to zero in
zfcp_wka_port_offline is not required, so remove it.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When running the scsi_scan from the zfcp workqueue and the target
device does not respond, the zfcp workqueue can block until the
scsi_scan hits a timeout. Move the work to the scsi host workqueue,
since this one is also used for the scan from the SCSI midlayer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Ensure the refcounting is correct even if we were not able to
schedule a work. In addition we have to make sure no scheduled
work is pending while we're dequeing the adapter from the
systems environment.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the I/O blocking mechanism in the FC transport class to allow
faster failovers for multipathing:
- Call fc_remote_port_delete early to set the rport to BLOCKED.
- Check the rport status in queuecommand with fc_remote_portchkready
to no longer accept new I/O for this port and fail the I/O with the
appropriate scsi_cmnd result.
- Implement the terminate_rport_io handler to abort all pending I/O
requests
- Return SCSI commands with DID_TRANSPORT_DISRUPTED while erp is
running.
- When updating the remote port status, check for late changes and
update the remote ports status accordingly.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The current number based id ERP logging is replaced by a string
based tag version. The benefit is an easier location of the code in
question and the removal of the lengthy array referencing the
individual messages.
The string (7 bytes) based version does not use more space since those
bytes were "used" anyway due to the alignment of the structure.
The encoding of the 7 byte string is as follows
[0-1] = filename
[2-5] = task/function
[6] = section
Due to the character of this string (fixed length) a string
termination is not required here.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
An adapter close was always performed whether it was required,
(e.g. in an error scenario) or not (e.g. initial open).
This patch is changing the process in only doing an
adapter close when it is required.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the device pointer in zfcp_unit for tracking if we have a
registered SCSI device. With this approach, the flag
ZFCP_STATUS_UNIT_REGISTERED is only redundant and can be removed.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
PORT_PHYS_CLOSING is only set and cleared, but not actually used
for status checking.
PORT_INVALID_WWPN is set when the GID_PN request does not return
a d_id for a remote port, e.g. when a remote port has been
unplugged. For this case, the d_id is zero. In the erp we can
check the d_id and use the normal escalation procedure that gives
up after three retries and remove the special case.
PORT_NO_WWPN is unused: Each port in the remote port list has a
valid wwpn. The WKA ports are now tracked outside the port
list. Remove the PORT_NO_WWPN flag, since this is no longer set
for any port.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The port flag DID_DID indicates whether we know the current id of the
port. This is always set in parallel. Since the id 0 is invalid
(because the port id 0 is invalid) we can remove the DID_DID flag:
d_id of 0 indicates an invalid d_id != 0 is a valid one.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Get rid of this one:
drivers/s390/scsi/zfcp_erp.c: In function 'zfcp_erp_thread':
drivers/s390/scsi/zfcp_erp.c:1400: warning: ignoring return value of
'down_interruptible', declared with attribute warn_unused_result
zfcp_erp_thread is a kernel thread which can't receive any signals.
So introduce a dummy variable and get rid of the warning.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Register zfcp with the new /proc/service_level interface to report the
FCP microcode level. When the adapter goes offline or a channel path
disappears, zfcp unregisters, since the microcode version might change
and zfcp does not know about it.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Waiting for the ERP to be finished in a task running in the global
kernel work-queue is a bad idea, especially if the ERP needs to run
another job in this work-queue before it can finish. -> deadlock.
This patch removes the necessity to wait for a finished ERP from the
scan task and moves the job scheduling to the end of the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Prevent a SCSI target scan for a rport which have turned invalid
in the meantime.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If an open port fsf request times out (in erp) the
corresponding erp_action member of the fsf
request need to set to NULL. If the port structure
will be removed later-on there will be still a
reference in the fsf request to the non existing
erp_action otherwise.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some further bus_id -> dev_name() conversions in s390 code.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Trace ids 107 and 3 are used twice, fix this to have unique ids for
the erp triggers.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Due to the character of a scheduled work we cannot guarantee the
LUN register to be finished before an initial device tries to use it.
Therefor we have to wait for PENDING_SCSI_WORK flag to be cleared
before proceeding.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The zfcp_erp_thread was using the nolock version of the dbf function.
This resulted in a list access while other tasks could modifying the
list. The symptom was an erp thread running at 100% CPU and never
returning from the dbf function.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In case of an adapter reopen all rports have to be deleted from the
environment. This should only happen for already registered rports
otherwise fc_remote_port_delete is called with a NULL pointer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Each adapter reopen trigger automatically a scan_port task which
is waiting for the ERP to be finished before further processing.
Since the initial device setup enqueues adapter, port and LUN which
are individual ERP actions, this process would start after
everything is done. Unfortunately the port_reopen requires another
scheduled work to be finished which is queued after the automatic
scan_port -> deadlock !
This fix creates an own work queue for ERP based nameserver requests.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reduce the size of zfcp data structures by removing unused and
redundant members. scsi_lun is only the mangled version of the
fcp_lun. So, remove the redundant field and use the fcp_lun instead.
Since the queue lock and the pci_batch indicator are only used in the
request queue, move them from the common queue struct to the adapter
struct.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Changing the zfcp behaviour from always having the nameserver port
open to an on-demand strategy. This strategy reduces the use of
limited resources like port connections. The patch provides a common
infrastructure which could be used for all WKA ports in future.
Also reduce the number of nameserver lookups by changing the zfcp
behaviour of always querying the nameserver for the corresponding
destination ID of the remote port. If the destination ID has changed
during the reopen process we will be informed and then trigger a
nameserver query on demand.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- Remove unused references and declarations, including one instance
of the FC ls_adisc struct that has been defined twice.
- Also remove the flags COMMON_OPENING, COMMON_CLOSING,
ADAPTER_REGISTERED and XPORT_OK that are only set and cleared, but
not checked anywhere.
- Remove the zfcp specific atomic_test_mask makro. Simply use
atomic_read directly instead.
- Remove the zfcp internal sg helper functions and switch the places
where it is still used to call sg_virt directly.
- With the update of the QDIO code, the QDIO data structures no
longer use the volatile type qualifier. Now we can also remove the
volatile qualifiers from the zfcp code.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Update the kernel messages in zfcp with input from the message review
and remove some messages that have been identified as redundant.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cleanup the code in zfcp_erp.c, move erp internal definititions to
this file and move FSF timeout handling to the FSF layer.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Automatically attach the remote ports in zfcp when the adapter is set
online. This is done by querying all available ports from the FC
namesever. The scan for remote ports is also triggered by RSCNs and
can be triggered manually with the sysfs attribute 'port_rescan'.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>