Debugging Kibana using Chrome developer tools

Posted in: Cloud, DevOps, Open Source, Site Reliability Engineering, Technical Track

Amazon Elasticsearch Service is a managed service to implement Elasticsearch in AWS. Underlying instances are managed by AWS and interaction with the service is available through API and AWS GUI.

Kibana is also integrated with Amazon Elasticsearch Service. We came across an issue which caused Kibana4 to show the following error message, when searching for *.

Courier Fetch: 10 of 60 shards failed.

Error is not very descriptive.

As Amazon Elasticsearch service is an endpoint only and we do not have direct access to the instances. We also have access to few API tools.

We decided to see what can be found from the chrome browser.

The Chrome Developer Tools (DevTools) contains lots of useful debugging possibilities.

DevTools can be started using several methods.

1. Right click and click Inspect.
2. From Menu -> More Tools -> Developer Tools
3. Press F12

Network tab under DevTools can be used to debug wide variety of issues. It records every requests made when a web page is loading. It captures wide range of information about every request like HTTP access Method, status and time took to complete the request etc.

By clicking on any of the requested resource, we will be able to get more information on the request.

In this case, the interesting bit was under the Preview tab. The Preview tab captures the data chrome got back from the search and store it as objects.

A successful query would look like the image below captured from Kibana3 of public website logstash.openstack.org.

kibana-es

We checked “_msearch?timeout=3000..” and received following errors messages under the nested values (For example “responses” -> “0” -> “_shards” -> “failures” -> “0”)

{index: “logstash-2016.02.24”, shard: 1, status: 500,…}index: “logstash-2016.02.24″reason: “RemoteTransportException[[Leech][inet[/10.212.25.251:9300]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; “shard: 1status: 500

So the issue is clear, fielddata usage is above the limit.

As per Amazon documentation,

Field Data Breaker –
Percentage of JVM heap memory allowed to load a single data field into memory. The default value is 60%. We recommend raising this limit if you are uploading data with large fields.
indices.breaker.fielddata.limit
For more information, see Field data in the Elasticsearch documentation.

Following url documents the supported Amazon Elasticsearch operations.

https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg-supported-operations.html

On checking the current heap usage (second column) of the data nodes, we can see that heap usage is very high,

$ curl -XGET “https://elasticsearch.abc.com/_cat/nodes?v”
host ip heap.percent ram.percent load node.role master name
x.x.x.x   10   85   0.00   –   m   Drax the Destroyer
x.x.x.x   7   85   0.00   –   *   H.E.R.B.I.E.
x.x.x.x   78   64   1.08   d   –   Black Cat
x.x.x.x   80   62   1.41   d   – Leech
x.x.x.x   7   85   0.00   –   m   Alex
x.x.x.x   78   63   0.27   d   –   Saint Anna
x.x.x.x   80   63   0.28   d   –   Martinex
x.x.x.x   78   63   0.59   d   –   Scorpio

Following command can be used to increase the indices.breaker.fielddata.limit value. This can be used as a workaround.

$ curl -XPUT elasticsearch.abc.com/_cluster/settings -d ‘{ “persistent” : { “indices.breaker.fielddata.limit” : “89%” } }’

Running the command allowed the kibana search to run without issues and show the data.

The real solution would be to increase the number of nodes or reduce the amount of field data that need to be loaded by limiting number of indexes.

AWS Lamda can be used to to run a script to cleanup indices as a scheduled event.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Devops Engineer
Minto Joseph is an expert in opensource technologies with a deep understanding of Linux. This allows him to troubleshoot issues from kernel to the application layer. He also has extensive experience in debugging Linux performance issues. Minto uses his skills to architect, implement and debug enterprise environments.

2 Comments. Leave new

Jared Still
March 9, 2016 7:47 am

Interesting article.

How did you know to search on timeout=3000?

Reply
Minto Joseph
March 21, 2016 4:13 am

Hi Jared,

Thanks for the comment.

“_msearch?timeout=300000” is the complete request.

Instead of clicking “_search?search_type=count” as you see the image, I clicked on “_msearch?timeout=300000”. The image provided is from kibana3 and the I was working with kibana4 that resulted in the change.

From the kibana4 source, you can see that _msearch is the passed to the search variable and 300000 is the default timeout set by kibana4 configuration.

src/server/lib/validateRequest.js

// methods that accept bulk bodies
var maybeBulk = ('_bulk' === maybeMethod && add && bulkBody);
var maybeMsearch = ('_msearch' === maybeMethod && add && bulkBody);

./src/server/config/kibana.yml

# Time in milliseconds to wait for responses from the back end or elasticsearch.
# This must be > 0
request_timeout: 300000

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *