Fishing for Insight in a Sea of Logs
Like any reasonably sized engineering organization, the systems and applications comprising our environment are generating thousands of log entries every second. To make sense of the noise, we aggregate those logs to a central location for search and analysis. We call the stack we use for this purpose “FluEK”. This stack allows us to examine traffic going through a complex system with multiple moving parts as well as diagnose issues quickly and effectively, correlating data across multiple systems and applications.
Give someone a fish
FluEK, pronounced like fluke, is an acronym I was surprised to find didn’t exist yet. It stands for Fluentd, Elasticsearch, and Kibana.
Fluentd
Fluentd is the mechanism we use to get log entries from the applications to the database. The fluentd client runs on all of our hosts to read and parse system and application logs. Individual fluentd clients ship logs to a cluster of servers running fluentd in its aggregator role, collecting the parsed entries and publishing them into Elasticsearch.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine. While we once ran our own Elasticsearch cluster, we currently use Amazon’s Elasticsearch Service.
Kibana
Kibana sits in front of Elasticsearch to explore and visualize our log data.
Teach someone to fish
Once we’ve collected logs across from the various components in our environment, we’re able to search for specific events, visualize activity, and investigate anomalies. FluEK enables us to pinpoint which campaigns are experiencing timeouts:
identify which specific URLs within those campaigns are causing the most timeouts:
or find a specific request to answer a client’s question.
We’re gonna need a bigger boat
When we moved to the Elasticsearch Service, we initially implemented a version 2.3 cluster - the latest available at the time. Eventually, the lure of the latest and greatest became too great to resist and we got caught up in the excitement of version 5.1. Despite all the great improvements from 2.3 to 5.1, the killer feature for us was the resolution of a particularly annoying UI bug between the older version of Kibana and Chrome.