In past few days I had two incidents and an outage, for just a few minutes. However, outage in a production environment is related to cost relatively and strictly. The server that had outage was because of failing over and then failing back about 4 to 5 times in 15 minutes. I was holding pager, and was then involved in investigating root cause for this fail-over and failed-back. Looking at the events in SQL Server error logs did not give me any clue towards what was happening, or why so I looked at the Windows Event View’s System log. I thought, “Maybe I have something there!”
There were two events that came to my attention:
Event Type: Error
Event Source: EventLog
Event Category: None
Event ID: 6008
Time: 1:14:12 AM
The previous system shutdown at 1:00:31 AM on 7/24/2014 was unexpected.
Event Type: Information
Event Source: Server Agents
Event Category: Events
Event ID: 1090
Time: 1:15:16 AM
System Information Agent: Health: The server is operational again. The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again.
The errors are closely related to the feature called Automatic Server Recovery (ASR) which is mainly configured with the server, and comes with the hardware. In our case, HP Blade, ProLiant server. There has been some resources/threads already discussed around similar topic. Most of the hardware vendor has somewhat similar software with similar functionality made available for servers.
In my case, my understanding was that maybe firmware are out of date and requiring updating, or the servers are aged. Further, I have sent my findings to customer with an incident report. In a couple of hours, I had a reply and the feedback I received was just what I was expecting, the hardware was aged. This may be the case with you when you see a message in event viewer which reads like “System Information Agent: Health: The server is operational again. The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again.” Go check with your system administrator. The root cause of this unexepcted shutdown may not be related or caused by the SQL Server, rather, the system itself. Please keep in mind that this could be one of the reasons, and certainly not the only.
Automatic System Recovery