Search This Blog

Monday, August 2, 2010

Error 1117 :The request could not be performed because of an I/O device error.

Our backups were failing under these conditions :

Scenario 1: The System databases plus few user databases are on local disk & few user databases are on LUNs.

Scenario 2: The System & user databases are completely on LUNs

The backups were running for some good amount of time but then use to fail with Error 1117.I know that taking backups on network is not suported but I was breaking my head on this ERROR (1117)to know the reason behind this error .After going through a few tests on my machine using external HDDs ,my understanding of this error is :


-> Error 1117 is ERROR_IO_DEVICE .Thats fine .But I was curious about knowing the situations under which this error might occur and what is the exact meaning on this Error .Does Error_IO_Device means that the Hardware is corrupt ? Found that this error occurs under the below situations and then found the reasons behind those situations as well :

STATUS_FT_MISSING_MEMBER
ERROR_IO_DEVICE

An attempt was made to explicitly access the secondary copy of information via a device control to the fault tolerance driver and the secondary copy is not present in the system.


STATUS_FT_ORPHANING
ERROR_IO_DEVICE
{FT Orphaning} A disk that is part of a fault-tolerant volume can no longer be accessed.


STATUS_DATA_OVERRUN
ERROR_IO_DEVICE
{Data Overrun} A data overrun error occurred.

STATUS_DATA_LATE_ERROR
ERROR_IO_DEVICE
{Data Late} A data late error occurred.


STATUS_IO_DEVICE_ERROR
ERROR_IO_DEVICE
The I/O device reported an I/O error

STATUS_DEVICE_PROTOCOL_ERROR
ERROR_IO_DEVICE
A protocol error was detected between the driver and the device.


STATUS_DRIVER_INTERNAL_ERROR
ERROR_IO_DEVICE
An error was detected between two drivers or within an I/O driver.


So this error mapping says that this error will be thrown out if anyof these conditions are met .In my situation we were falling in into STATUS_DATA_LATE_ERROR since we were also getting thses entries in the SQL serve errorlogs : "x I/O requests are pending for more then 15 secs ............filename.mdf"

If you are running backup jobs you might also get this error -1073548784 .
This is a common error and may come when the query you are running remotely is incorrect , or the table you are trying to drop does not exist .Try to export a table that already exists in another DB and you will recreate this OLEDB error.So we need not to worry about finding the message identifier for this number .


Action plan :
-----------------
--try to take backup of another database located remotely and of near about same size . I mean around 20GB.

--Run Chkdsk on this drive or ask someone to do that and see if the consistency errors come up .

--Create a similar database on another external drive like this one and take the backup .


Conclusion :
---------------
I am very much certain that the issue is with the drive and(OR)Network.The 15 sec IO delay messages in Errorlogs also suggests the same .But as you can see this error also comes when dataa gets late in reaching the destination (STATUS_DATA_LATE_ERROR) I am suspecting that the network might also be a bit slow and contributing to the backup failure .

Now the ball is in your court how you explain this to the client :) .

Hapy Learning

No comments: