Tuesday, May 28, 2013

Secondary Replica of AlwaysOn Availability group is in "Resolving" state after fail over

While testing AlwaysOn Availability groups I failed over primary to secondary couple of times. However 3rd time when I failed over the secondary did not come up immediately and was in “Resolving” state.

Problem was with maximum failures threshold. By default it is set to 2 failures in 6 hours.

So since I failed over AG more than 2 times within an hour it tripped the maximum failures threshold for this clustered resource and came up with “Resolving” state.

Solution 1: wait for default period of 6 hours.
Solution 2: Change the threshold and fail back to original primary.

Default failback setting is set to immediate. I would also recommend setting it to Prevent Failback.

Here is how I have configured my availability group clustered resource

Following KB also has other scenarios


Please note that I used these configurations only for testing. You may need to reconsider for your production setup depending your fail over needs

