Can a bad LUN cause all nodes of a cluster to blue screen?
i need in determining cause if failure of cluster nodes.
this 2 node cluster running approximately ten clustered sql server instances. clustered luns being served netapp device via fiber channel. cluster has been running fine on year entire cluster crashed. unaware of changing on either node.
it started out sql instance failing , forth few times - reason unknown. 1 of nodes blue screened, few minutes later other blue screened.
attempts restart either node kept resulting in blue screens. dump indicated blue screen occurring in clusres.dll on both nodes.
we repaired cluster offlining of luns, removing them cluster storage, onlining luns, , adding them cluster 1 one. found several had become corrupt , not mounted. had reformat drives, rebuild sql instances , restore databases.
whether corrupt volumes cause of failure or result of failure not known. unfortunately cant @ sql server logs servers disks belonged since logs on disks themselves.
eight days later, exact same thing happened again!
can corrupt lun part of sql server resource group cause this? if seems me huge vulnerability in clustering if 1 failed sql instance can cause entire cluster down , cause corruption in other cluster storage volumes.
in first incident 1 of corrupt volumes quorum disk. can under stand cause cluster go down, should cause both servers blue screen? other 2 corrupt volumes belonged sql server resource groups.
in second incident quorum disk not corrupt. still not sure of root cause here. know snapshot of single lun failed on netapp. lun cluster disk 1 sql server instance , able bring instance online first deleting snapshot. again appears single failed resource group somehow caused every node of cluster blue screen.
chuck
in short, yes...issues connectivity lun can cause cluster blue screen stop 0x9e...actually, issues clustered resource can cause box blue screen in 2008 clustering.
i'd recommend reviewing following behavior configurable if don't want box bsod:
Windows Server > High Availability (Clustering)
Comments
Post a Comment