Be wary of SCSI device conflicts: Device \\Device\\Scsi\\*** does not respond within the transfer waiting time

  

Extrapolated SCSI card after processing good example:

Just arrived at the customer, it is the customer's network management Member Xiao Liu broke into the machine room. It turns out that the customer is a relatively large-scale foreign-funded supermarket chain in the local area and has been relying on this back-office system for trading, accounting and settlement. One morning, the business staff suddenly said that the terminal checkout system could not be used. Xiao Liu found that the server-side database could not be accessed, so he was forced to enable a set of standby server support services.

But the performance of the standby server is far from meeting the load demand for frequent data reading. When the supermarket business is the busiest, it is also the time when the server crashes most frequently, and the newly generated data needs financial overtime. Manually enter the original system to complete the settlement of the current day and the current month. All the business people complained about Xiao Liu, and even the always-acceptable manager saw Xiao Liu’s face gloomy in the past two days.

I heard Xiao Liu’s complaints, and I examined the user’s environment in detail. The supermarket used two IA-based servers to mount a SCSI disk array as a two-node system. The host adopts the Windows2000 advanced server operating system, and uses a supermarket-specific accounting system. The background is SQLServer2000.


The two servers use the MSCS that comes with the Windows system to work in Active-Passive master-slave mode. MSCS detects, manages, and switches the resource groups of the dual-machine to ensure SQL. The service is always working. Due to the long operating hours of the supermarket, the server has been running since 8:30 in the daytime, providing all database and other software services to access the data; at 20:00, the client is closed, the service request is stopped; 22:00 is used for SQL The built-in management tool backs up the host data and backs up the data on the server to disk.

When the fault occurred database inaccessible, Liu launched the first set of backup servers as accounting systems, and then view the host of the event viewer and found the host and backup machines are ID2, 5, 14 error, the time is 22:01 the night before (the time to start the backup). The network administrator restarts the disk array and the database can be revisited. Originally, Xiao Liu switched the business system back and thought it was fine, but the same failure occurred the next day. After restarting the disk array, the database could be accessed again. The same failure occurred in successive days.

After preliminary judgment, I think the problem should be on the data connection, so the exclusion method is used to replace the server to the array's internal and external cables, SCSI terminators, disk array controllers and the entire disk array, but The problem still exists, and I finally switched to an extrapolated SCSI card, which is in good condition. I can finally return to Beijing to cross the Boss.

Testing, where is the root of the problem? The problem should be a paragraph, but I never thought about it on the way back to Beijing. Why did replacing the built-in SCSI with external SCSI solve the problem? It now appears that the judgment at the time was more like making it intuitively.

After returning to Beijing, I simulated the user's real environment in the lab. Four slices on the Disk Array were mapped to LUN1, LUN2, LUN3, LUN4, and SQL SERVER2000 was configured. 1, service switching, big data (about 2GB) between different partitions, through the network neighborhood shared folder copy. 2. Run IOmeter for 24 hours on each partition. 3. Use mandatory means to disable Private net. Public net uses mixed communication to send heartbeat signals at the same time as external services. 4. If the SCSI cable of Server1 is forcibly removed, the connected server is basically closed and the cluster switching service is normal.

-------------------------------------------- -------------------------------------------------- ----------- The driver detected a controller error on \\Device\\Scsi\\adpu160m2. Data: 0000: 0f 00 10 00 01 00 6a 00 ......j. 0008: 00 00 00 00 0b 00 04 c0 .......À 0010: 50 50 00 c1 00 00 00 00 PP .Á.... 0018: 49 00 00 00 00 00 00 00 I....... 0020: 00 00 00 00 00 00 00 00 ........ 0028: 00 00 00 00 00 00 00 00 ...... 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ;××-××-×× Time: ××:××:×× User: N/A Computer: NT2 Description: During paging, on device\\Device\\Harddisk3\\ An error was detected on DR3. Data: 0000: 04 00 22 00 01 00 72 00 .."...r. 0008: 00 00 00 00 33 00 04 80 ....3.. 0010: 2d 01 00 00 00 00 00 00 -. ...... 0018: 00 00 00 00 00 00 00 00 ........ -------------------------- -------------------------------------------------- -----------------------------

Device\\Device\\Scsi\\adpu160m2 did not respond within the transfer wait time. The driver detected a controller error on \\Device\\Scsi\\adpu160m2. These two messages indicate that the request command issued from Windows did not reach the specified external disk, and the windows system began to appear unresponsive due to the interruption of the connection.

Analysis of Results, Beware of SCSI Device Conflicts The results of the test determined that there were no problems with the server and the disk array. The problem was concentrated on the server-to-disk array connection. This is exactly the same as my judgment at the scene.

So what is wrong with the connection line?

From the SCSI bus principle, the SCSI physical layer transmits SCSI signals and helps protect them from interference. The physical layer contains terminators, cables, enhanced adapter cards and motherboard traces, connectors, and other specifications such as signal conductor impedance values, connector space, cable length, and more.

As each system boots or the SCSI bus resets, the SCSI bus that receives the signal begins to update. At the same time, all received SCSI bus initiators begin to locate and negotiate with all target devices on the SCSI bus. These negotiations establish an accurate mechanism for performing subsequent data transfers. The bus terminator tells the SCSI host controller where the entire bus is terminated and sends a reflected signal to the controller. A termination signal must be made at both physical terminals to use the SCSI bus.

The physical bus terminator is a kind of hardware connector, which is divided into active type and passive type. The active type uses the voltage regulator to operate, and the passive type uses the energy signal on the bus to operate. Passive type More accurate than the active type; the self-terminating cable can replace the physical bus terminator, and is also a kind of hardware, which is often used to connect two hosts to the same physical device. Most SCSI devices have built-in terminators and use a jumper to control ON/OFF.

The terminator seems to be very simple, but in the actual application process, the problem is mostly due to this. Although the passive terminal can be used in SCSI-I and SCSI-II specifications, before SCSI-II, because its data transfer rate is not very fast (5Mbytes/sec or less), you may experience In some cases, it doesn't seem to happen without the Terminator. However, after Fast SCSI, as its data transfer rate increases rapidly, the Terminator settings must be paid attention to at both ends of the SCSI bus, and the active Terminator must be used. Otherwise, the data transmission error is incorrect, and even the SCSI Device cannot be recognized by the Initiator (that is, the SCSI device cannot be found). In our actual experience, many SCSI hard drives can't be installed smoothly, not the hard disk itself, but the quality of the cable is not good, or the Terminator positioning error, and the resulting signal is not correct.

by checking the SCSI ID and bus terminator, we can find solutions to most of the phenomenon of conflict, this is a SCSI device users must pay attention to that. We saw that on the SCSI bus, drivers, interfaces, and cables can cause problems at any point. During the customer's service, the cable, termination, disk array host, and onboard controller have been replaced. Unless recommended by the motherboard manufacturer, it should not be used in a cluster environment where external disk devices are attached, because it is integrated into the components of the motherboard. Part of it, affected by other components, will inevitably be focused or sacrificed, resulting in instability of the SCSI bus in this environment. Therefore, when the SCSI interface is changed to external, the problem of the user's SCSI device conflict is solved.
Network Reprint:

Copyright © Windows knowledge All Rights Reserved