Adding Networks to Exadata: Fun with Policy Routing

Posted in: Technical Track

I’ve noticed that Exadata servers are now configured to use Linux policy routing. Peeking at My Oracle Support, I’ve noticed that note 1306154.1 goes in a bit more detail about this configuration. It’s apparently delivered by default with factory build images 11.2.2.3.0 and later. The note goes on to explain that this configuration was implemented because of asymetric routing problems associated with the management network:

Database servers are deployed with 3 logical network interfaces configured: management network (typically eth0), client access network (typically bond1 or bondeth0), and private network (typically bond0 or bondib0). The default route for the system uses the client access network and the gateway for that network. All outbound traffic that is not destined for an IP address on the management or private networks is sent out via the client access network. This poses a problem for some connections to the management network in some customer environments.


It goes on to mention a bug where this was reported:

@ BUG:11725389 – TRACK112230: MARTIAN SOURCE REPORTED ON DB NODES BONDETH0 INTERFACE

The bug is not public, but the title does show the type of error messages that would appear if a packet with a non-local source address comes out.

This configuration is implemented using RedHat Oracle Linux-style /etc/sysconfig/network-scripts files, with matched rule- and route- files for each interface.

A sample configuration, where the management network is in the 10.10.10/24 subnet, is:

[[email protected] network-scripts]# cat rule-eth0
from 10.10.10.93 table 220
to 10.10.10.93 table 220
[[email protected] network-scripts]# cat route-eth0
10.10.10.0/24 dev eth0 table 220
default via 10.10.10.1 dev eth0 table 220

This configuration tells traffic originating from the 10.10.10.93 IP (which is the management interface IP on this particular machine) and traffic destined to this address to be directed away from the regular system routing table to a special routing table 220. Route-eth0 configures table 220 with two routers: one for the local network and a default route through a router on the 10.10.10.1 network.

This contrasts with the default gateway of the machine itself:

[[email protected] network-scripts]# grep GATEWAY /etc/sysconfig/network
GATEWAYDEV=bondeth0
GATEWAY=10.50.50.1

The difference between this type of policy routing and regular routing is that traffic with the _source_ address of 10.10.10.93 will automatically go through default gateway 10.10.10.1, regardless of the destination. (The bible for Linux routing configuration is the Linux Advanced Routing and Traffic Control HOWTO, for those looking for more details.)

I ran into an issue with this configuration when adding a second external network on the bondeth1 interface. I set up the additional interface configuration for a network, 10.50.52.0/24:

[[email protected] network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1

I also added rule and route entries:

[[email protected] network-scripts]# cat rule-bondeth1
from 10.50.52.104 table 211
to 10.50.52.104 table 211
[[email protected] network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211

This was a dedicated data guard network to a remote server, IP 10.100.52.10.

The problem with this configuration was that it didn’t work. Using tcpdump, I could see incoming requests come in on the bondeth1 interface, but the replies came out the system default route on bondeth0 and did not reach their destination. After some digging, I did find the cause of the problem: In order to determine the packet source IP, the kernel was looking up the destination in the default routing table (table 255). And the route for the 10.100.52.0 network was in non-default table 211. So the packet followed the default route instead, got a source address in the client-access network, and never matched any of the routing rules for the data guard network.

The solution ended up being rather simple: Take out the “table 211” for the data guard network route, effectively putting it in the default routing table:

[[email protected] network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1

And then we ran into a second issue: The main interface IP could now be reached, but not the virtual IP (VIP). This is because the rule configuration, taken from the samples, didn’t list the VIP address at all. To avoid this issue, and in the case of VIP addresses migrating from other cluster nodes, we set up a netmask in the rule file, making all addresses in the data guard network use this particular routing rule:

[[email protected] network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211

So to sum up, when setting up interfaces in a policy-routed Exadata system remember to:

  • Set up the interface itself and any bonds using ifcfg- files.
  • Create a rule- file for the interface, encompassing every possible address the interface could have. I added the entire IP subnet. Add “from” and “to” lines with a unique routing table number.
  • Create a route- file for the interface, listing a local network route and a default route with the default router of the subnet, all using the table number defined on the previous step.
  • Add to the route- file any static routes required on this interface, but don’t add a table qualifier.

The final configuration:

[[email protected] network-scripts]# cat ifcfg-eth8
DEVICE=eth8
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[[email protected] network-scripts]# cat ifcfg-eth12
DEVICE=eth12
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[[email protected] network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1
[[email protected] network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211
[[email protected] network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1
email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Marc is a passionate and creative problem solver, drawing on deep understanding of the full enterprise application stack to identify the root cause of problems and to deploy sustainable solutions. Marc has a strong background in performance tuning and high availability, developing many of the tools and processes used to monitor and manage critical production databases at Pythian. He is proud to be the very first DataStax Platinum Certified Administrator for Apache Cassandra.

6 Comments. Leave new

Routing Problems w/ Oracle linux-Exadata (Solved)
January 15, 2013 9:27 am

[…] basically specify that traffic go back out whatever interface it came in on. For more info, see https://www.pythian.com/news/36747/ad…olicy-routing/ Good luck and I hope this helps someone. # cat rule-bondeth0 from 10.22.102.0/23 table 210 to […]

Reply
Vagelis Nisyraios
May 11, 2013 5:09 am

Hello Marc,

Sorry to bother you but an attempt to configure Linux Advanced Routing (policy routing) brought me here after reading hundreds of other sites and opening an SR in MOS. Three months have passed without a valid info from their side but this is not surprising based on my experience from MOS. The key issue here is where you are pointing (among others) the following:
“…•Add to the route- file any static routes required on this interface, but DON’T ADD A TABLE QUALIFIER…”.
This specific notification is included only in your blog (as I’ve seen so far).

The problem is that although your way of setup regarding addition of extra static routes in route-ethx file works just fine (it goes to main routing table), according to relevant redhat knowledgebase article regarding “How to make routing rules persistent, when I want packets to leave the same interface they came in?” this is a wrong setup in terms that you must add the table qualifier to the static route entry! I quote from the knowledgebase article (example from article’s route-eth0 setup):
…..
# cat /etc/sysconfig/network-scripts/route-eth0
default via dev eth0 table 1
#to add additional static routes
# via dev eth0 table 1

As you can see on the last line he adds the table qualifier to the additional static route and btw he removes the DEFAULT GATEWAY from any relevant files (/etc/sysconfig/network, ifcfg-eth* files). The problem of course is that when I add the table qualifier to my static route, although it’s seen in the specific table’s routes, it’s totally ignored so I cannot access (outgoing initiated only) the corresponding network (I get “network unreachable…”).
So bottom line if you are not bored already. Besides the very important fact that your way works, do you have any official article/document from Oracle or Redhat that backs this type of setup for extra static routes in Advanced Routing configurations?
I hope that you will find the time to clarify things if possible.

Thank you in advance,
Best Regards,
Vagelis Nisyraios
Athens, Greece

Reply

Dear Marc

i have one question. Can we install to crs on exadata x4-2 machine?

Reply
Marc Fielding
May 21, 2014 10:18 am

Hi Jatin,

I assume you’re asking if you can run two different RAC clusters in the same physical Exadata rack. I don’t see it done often, but it’s possible to physically partition your rack between two clusters, each with their own compute and storage servers. This so-called “split rack” configuration would locally separate everything but the InfiniBand and power distribution infrastructure. To maintain high availability and quorum, you would want a minimum of 2 compute and 3 storage servers for each part of your split rack.

As isn’t particularly well documented, so I’d suggest engaging professional services for assistance if you’re considering this type of configuration.

Marc

Reply
Gerardo Arceri
June 12, 2016 6:06 pm

Faced a similar issue today, but this particular server had an extra interface with an IP configured on the same subnet as the external interface, i got the “martian source” messages on dmesg and in this case the server was unable to talk to the outside world (or resolve the names of the rest of the nodes).
Solution was to disable rp_filter on the bondeth0 interface with sysctl net.ipv4.conf.bondeth0.rp_filter=0 (add it to /etc/sysctl.conf if you want it to persist across reboots)

Reply

Thanks for the article, been fighting this one for a while. It was really helpful!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *