2 Node - SQL Cluster fails suddenly without much information
hey
we have 2 node cluster sql 2008 r2 failed mysteriously, without reason.looking @ logs on 1 of nodes,
log name: system source: tcpip date: 08/10/2012 00:44:46 event id: 4199 task category: none level: error keywords: classic user: n/a computer: server-db4.local description: system detected address conflict ip address x.x.x.x system having
network hardware address aa-11-bb-22-cc-44. network operations on system may disrupted result.
other error
log name: system source: microsoft-windows-failoverclustering date: 08/10/2012 00:44:43 event id: 1135 task category: node mgr level: critical keywords: user: system computer: server-db4.local description: cluster node 'server-db3' removed active failover cluster membership.
the cluster service on node may have stopped.
this due node having lost communication other active nodes in failover cluster.
run validate configuration wizard check network configuration.
if condition persists, check hardware or software errors related network adapters on node.
also check failures in other network components node connected such hubs, switches, or bridges.
just add cluster not new setup, been running year now.
looking cluster log i've found,
000004e0.000020e8::2012/10/08-00:35:43.540 warn [res] physical disk <quorum>: pr reserve failed, status 170000004e0.000020e8::2012/10/08-00:35:43.540 info [res] physical disk: validatereservations: size of reservations 16
000004e0.000020e8::2012/10/08-00:35:43.540 info [res] physical disk: key: 1c9f7466734d, type 5 scope 0
000004e0.000020e8::2012/10/08-00:35:43.540 info [res] physical disk: sleeping 6 secs
00001ad0.00001f48::2012/10/08-00:35:43.789 err [quorum] node 1: death timer expired after 20 seconds (death timer started @ 2012/10/08-00:35:23.368). lost quorum.
00001ad0.00001f48::2012/10/08-00:35:43.789 err lost quorum (status = 5925)
00001ad0.00001f48::2012/10/08-00:35:43.789 err lost quorum (status = 5925), executing onstop
00001ad0.00001f48::2012/10/08-00:35:43.789 info [dm]: shutting down, unloading cluster database.
00001ad0.00001f48::2012/10/08-00:35:43.789 info [dm] shutting down, unloading cluster database (waitforlock: false).
geoff n. hiten principal consultant microsoft sql server mvp
Windows Server > High Availability (Clustering)
Comments
Post a Comment