Couchbase Smart Client Failure Scenarios

Posted in: Technical Track

The Couchbase java smart client communicates directly with the cluster to maintain awareness of data location. To do so it gathers information about the cluster architecture from a manually maintained configuration file listing all the nodes. The smart client configuration is done within the Java code and does not have a pre-designated file while the Moxi configuration is generally installed at /opt/moxi/etc/moxi-cluster.cfg

Assuming the smart client is on a separate server from the affected node there are two situations where communication between the client and a specific node might be interrupted.

In the first scenario, a node may fail. If so, the rest of the cluster will detect that from standard heartbeat checks, which are built in to Couchbase, and map its data to the replica nodes. The smart client is informed of the remappings and should be able to find all identified data again. There are known bugs with some client versions (e.g. 1.0.3) — if you experience timeouts with the client, be sure you’re using the latest build. We also recommend that you use autofailover and that you test your email alerts. You must manually rebalance after recovery; this does not happen on its own.

In the second and more common scenario a network or DNS outage has occurred. If a node is unreachable by one or more clients, yet all nodes can still talk to that node, there is no built-in mechanism for the cluster to remap data from that node to other nodes.

Additionally, there is no built-in mechanism for the smart client to reroute traffic so you will experience timeouts in this situation.  When the network issue resolves the client should stop presenting errors.

Consider scripting a heartbeat check to run on your app servers that use the Couchbase CLI and specify failover procedures.

email

Author

Interested in working with Jay? Schedule a tech call.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *