One of the main goals in architecting a Disaster Recovery (DR) solution is to make a DR failover transparent to the end users. Too often, users must reboot their desktops, clear their browser cache and the jinitiator jar cache, and so on, even when we have made sure that the post-failover URL of the 11i instance is the same. After a failover of an 11i instance from a primary site to a DR site, if the user can operate without changing anything in his desktop, only then can we say that the goal is achieved.
In most cases the culprits are: forgetting the DNS setup for the hostnames of Middle Tiers, or the load balancer, if one is used; and the caching of DNS entries at the different levels in the network. A quick look at the caching section of Wikipedia’s page on DNS gives some idea of I’m talking about. Because of the default settings, the old IP address gets cached in the user’s desktop and in caching DNS servers in the network. As a result, the user’s desktop is still trying to reach the old server, which is now offline.
The best fix for these kind of DNS side effects is to change the TTL (Time To Live) parameter of the DNS entry for the hostname from the default value to a smaller one. I prefer setting it to a value a little smaller than the time you take to failover. That is, if you take 60 minutes to failover from Primary to Secondary datacenter, then set the TTL to 50 minutes.
Let’s take an example here. Let’s say our 11i instance has the URL
https://apps.example.com:8000, the primary instance being
windsor, the secondary
ottawa. And we have two load balancers: one at primary site and one at the secondary, with hostnames
lb.ottawa.example.com respectively. If the DNS is set up with default values, it will look like this:
hostname TTL Type value ---------------------------------------------- apps.example.com 86400 CNAME lb.windsor.example.com lb.windsor.example.com 86400 A 192.168.1.100 lb.ottawa.example.com 86400 A 192.168.2.100
apps.example.com is an alias
lb.windsor.example.com and the
TTL value is set to 86400 seconds, i.e., 24 hours. That means this record gets cached for a duration of 24 hours at the user’s desktop and at any caching DNS servers being used by the client. So at the time of failover, even though we change the DNS records of
apps.pythian.com to point to the
ottawa load balancer instead of
windsor, because the
TTL is set to a very high value of 24 hours, the user’s browser will still be trying to reach the primary site load balancer, as it is cached in their desktop for next 24 hours
As I suggested earlier, if we set the
apps.example.com to 50 minutes (3000 seconds) and do the changes to DNS as first step in the failover procedure, then by the time we finish (which is supposed to be 60 minutes), the old DNS records in the user’s desktop cache and the caching DNS server will have expired, and they will start seeing the new alias for
hostname TTL Type value ---------------------------------------------- apps.example.com 3000 CNAME lb.ottawa.example.com lb.windsor.example.com 86400 A 192.168.1.100 lb.ottawa.example.com 86400 A 192.168.2.100
Some of you might already be thinking, why not set it to even lower values, like 5 minutes? The main problem with setting it to a lower value such as this is that it will increase the load on the DNS server. If you have a single DNS server with too low values, any kind of outage on DNS server will effect your users immediately, as their desktops will be making DNS lookups much more frequently than before. So in cases where you have low TTL settings, make sure you have at least two DNS servers at two different locations.
Please feel free to post your experiences related to DNS in the comments section. Any comments or suggestions are welcome!
>> One of the main goals in architecting a Disaster Recovery (DR) solution is to make a DR failover transparent to the end users.
I doubt if 11i Instance fail over can be made transparent to the users. Is there such a technology…? You will have at least to bounce the middle tier services.
You are right. We cannot make 11i failover 100% transparent. There is some downtime involved there. My point was more in the context of URLs that user uses to login to 11i after failover. I have seen clients, who use completely different URL for the DR failover instance. This makes it difficult for the end User, who often bookmarks the 11i URL in his/her browser
I appreciate your effort in posting the DNS Setup for Effective 11i DR Failover. I am implementing a DNS level failover for our 11i applications, on the same server it contains the other 3rd party application like appxwors for scheduling the jobs, and it does uses any Load balancers as of now.My concern is if we use DNS failover then end users can access the secondary server with the same URL or they need to change the URL.
I appreciate your patience in presenting the article. Do you have any best practise documents available for implementing the DNS for 11i failover.
Appreciate your effort in clarifying the below thing.
I want to have a failover DNS, where, 1 entry can point to 2 different ip address. But at one time, only point to 1 primary server.
And if the primary server down or not available, the other server will take over and the dns can automatically change the dns entry to point to the failover server.
1. I want to know whether simplefailover can do or not?
2. is it free software?
3. how about maintenance. is it easy to maintain?
you can make your DNS point to 2 IP addresses at the same time, the browser will connect to what ever server thats listening on web port. have apache running only the server thats is active. Make sure to test it before implementing
review the below link for better understanding of DNS
thanks for the information
What are things we need to take care from Oracle Applications and Database side for the DNS Level failover.
My point was more in the context of URLs that user uses to lo-gin to 11i after fail over.Yes you are correct to make sure of maintaing at-least two DNS servers