Recently I’ve been working in an environment comprised of nearly 60 Docker containers. When things go wrong, as they sometimes do, you can make educated guesses about which logs to look at but sometimes you’ll guess wrong. Or you know which ones to look at but want to watch several at once without having 8 PuTTY windows open.
You need something more fit for purpose. Enter ELK. ELK stands for Elasticsearch, Logstash, and Kibana. Rather than explain them in the order they appear in the acronym, I’ll do it in the order in which they are used, but rather than attempt to explain it myself and screw it up, the following summaries are from this helpful O’Reilly article.
Logstash “is responsible for giving structure to your data (like parsing unstructured logs) and sending it to Elasticsearch.”
ElasticSearch “is the search and analysis system. It is the place where your data is finally stored, from where it is fetched, and is responsible for providing all the search and analysis results.”
Kibana “allows you to build pretty graphs and dashboards to help understand the data so you don’t have to work with the raw data Elasticsearch returns.”
What am I building here?
OK, so what does all that look like? It’s pretty simple. You configure your logs to write to Logstash. Logstash then filters the log messages, turning the unstructured messages into structured data. That structured data is output to Elastic, where it is indexed. Kibana then allows you to query that indexed structured data to discover useful information and visualise it.
Getting Started
This is the easy part. Follow the instructions here to clone the ELK stack to your machine and start it up. This basically gets you to pull the three containers and fire them up, with some configuration taken care of. Nothing in this life, however, is for free, so there is some work to be done.
[I must offer all glory to Anthony Lapella for producing such a beautifully wrapped ELK package]
Wiring
I’ve observed before that modern software development is mostly plumbing. Happily, the ELK stack is pre-configured such that you don’t need to wire together the bits of ELK. However, with ELK, you do need to;
- Configure your Docker containers to log to a location,
- Configure Logstash to pick them up from that location.
- Configure Logstash to filter the data.
- Configure Logstash to send the data to Elastic
Configuring container log location
Because I might want to run my containers without sending their logs to ELK, I need a way to turn it on and off without too much messing around.
The way I did this was by using a second docker-compose.yml file. This secondary file – which I called “elk-logging.yml” – contains a section for each of your containers. Each section contains the following;
my_service: log_driver: syslog log_opt: syslog-address: "tcp://localhost:5000"
What this does is tell the container to use syslog, and to send syslog over TCP to port 5000.
So what you need to do is create your secondary YAML file – elk-logging.yml – with as many of the above sections as you have containers you want to log to ELK, with the “my_service” parts replaced with the names of all your containers.
Configuring Logstash input
The next step is to configure Logstash’s input to listen on that port. Fortunately, the ELK stack you cloned earlier already has this configured. The configuration file in in docker-elk/logstash/pipeline/logstash.conf.
Looking at this file shows an input section that shows it listening on TCP:5000.
input {input { tcp { port => 5000 }}
So you don’t need to do anything; this is just for information.
Grokking in Fullness: configuring Logstash filters
This is the most fiddly part of the exercise, because you need to mess about with grok regexes. When you’re dealing with this many containers, the chances of them all using the same log output syntax is remote. There are a couple of approaches you could take;
- Specify multiple filters, and fall through each until a working filter is found.
- Specify a single filter that is powerful enough to recognise optional components as and when they appear.
I tried both of these approaches, but struggled to get the syntax in the logstash.conf file right, so I eventually settled on the One Grok To Rule Them All. And this is pretty much what it looks like;
<%{NUMBER:priority}>%{SYSLOGTIMESTAMP:syslogtimestamp}\s%{GREEDYDATA:container}\[%{POSINT:containerprocessid}\]\:\s*((?<springtimestamp>%{YEAR}[\-\/]%{MONTHNUM2}[\-\/]%{MONTHDAY}\s%{TIME}))?\s*((\[)?%{LOGLEVEL:loglevel}(\])?)?\s*(%{POSINT:processid})?\s*(---)?\s*(\[\s*(?<thread>[A-Za-z0-9\-]*)\])?\s*((?<loggingfunction>[A-Za-z0-9\.\$\[\]\/]*)\s*\:)?\s*%{GREEDYDATA:logmessage}
Impressive looking, right? I won’t attempt to explain every component, but will try to summarise. Firstly, my log messages seem to broadly be “Syslog format followed by Spring Boot format”, e.g.
<30>Sep 11 10:18:58 my_container_name[1234]: 2017-09-11 14:18:58.328 WARN 14 --- [ thread_name] a.b.c.d.CallingFunction : Oops, something went a little strange, but nothing to worry about.
Everything up to the red colon is SysLog, everything after is Spring Boot. Here’s a map of each bit of the message with the part of the grok that extracts it;
<%{NUMBER:priority}> <30> %{SYSLOGTIMESTAMP:syslogtimestamp} Sep 11 10:18:58 %{GREEDYDATA:container} my_container_name [%{POSINT:containerprocessid}\] [1234] ((?<springtimestamp>%{YEAR}[\-\/]%{MONTHNUM2}[\-\/]%{MONTHDAY}\s%{TIME}))? 2017-09-11 14:18:58.328 ((\[)?%{LOGLEVEL:loglevel}(\])?)? WARN (%{POSINT:processid})? 14 (\[\s*(?<thread>[A-Za-z0-9\-]*)\])? [ thread_name] ((?<loggingfunction>[A-Za-z0-9\.\$\[\]\/]*)\s*\:)? a.b.c.d.CallingFunction %{GREEDYDATA:logmessage} Oops, something went a little strange, but nothing to worry about.
The blue text are the names of the fields that we’re extracting. The red text are built-in grok patterns that we’re using; you can find them all here. Everything else is hardcore grok!
Everything after containerprocessid is wrapped inside “()?”, which indicates that everything inside the brackets is optional. This is because, from container to container, message to message, the various components of the Spring Boot log message weren’t always present, and so I need to tell the grok that.
Like it or not, you’re going to have to figure this bit out for yourself. It involves trial and error, and learning each aspect of the regex / grok syntax as you go. When I started this process, I had used regexes a bit over the years, but was by no means a guru. Once you understand the syntax, it’s quite quick to get something powerful working, plus it looks super hard and you can impress all your coworkers who don’t grok
My advice is; start small, and figure out how to extract each part of your log message one bit at a time. The good news is that there are lots of online grok checkers you can use. Kibana has one built in, and this one was really useful. A lists of grok patterns is also available here.
Configuring Logstash output
Finally, you need to configure Logstash to send the data to Elastic. As with the input section, this is already pre-configured as part of the ELK stack your cloned at the start.
output { elasticsearch { hosts => "elasticsearch:9200" }}
This shows that Logstash is sending the logs out to elasticsearch on TCP:9200.
OK, how do I get all this working together?
It’s pretty easy. There are only 3 steps to it.
1. Update the logstash.conf with your filter grok
Your final logstash.conf file should look something like;
input { tcp { port => 5000 } } filter { grok { match => {"message" => "<<your grok here>>"} } } output { elasticsearch { hosts => "elasticsearch:9200" } }
You just need to copy your logstash file over the top of the existing one in /docker-elk/logstash/pipeline. You can copy the original aside if you want, but move it out of that folder as logstash can get confused if it finds two possible config files.
If you need more help on the format of the logstash.conf file, this page is useful.
2. Restart logstash to pick up the changes to the config file
Restarting the logstash container is simple. Make sure you’re in the /docker-elk folder, then simply;
docker-compose restart logstash
Because it’s easy to make a formatting / syntax error in the file, or especially the grok, make sure you check that your logstash container is running, and that it isn’t complaining about syntax errors. To check whether the container is running;
docker ps | grep logstash
If you logstash container is there, that’s good. Then, to check the logstash log to see if any errors are visible;
docker logs -f dockerelk_logstash_1
At the end of the log, you see a message showing the logstash started successfully. If it wasn’t running when you ran “docker ps”, there’s probably an error in the log showing where the syntax error is.
3. Restart your application containers with the ELK override file
Finally, to restart your application containers and get them to log to logstash, navigate to where you application docker-compose.yml and your elk-logging.yml files live, and;
docker-compose stop docker-compose -f docker-compose.yml -f elk-logging.yml up -d
This tells docker to start all the containers detailed in the docker-compose.yml, with the additional logging parameters detailed in elk-logging.yml, creating them if necessary, and doing it in detached mode (so you can disconnect and the containers will keep running).
Configuring Kibana
We’re into the final stretch. The last piece of configuration we need to do is tell Kibana the name of the Elastic index that we want to search.
Again, the pre-configuration sorts most of this for you. All you need to do is navigate to http://<your server>:5601. Kibana will load and show a page that asks you select an Index pattern and a time field.
Now, between Logstash and Elastic (full disclosure: I couldn’t figure out which was responsible for this), a new Elastic index is created every day whose names follow the convention “logstash-dd.mm.yyyy”.
The Kibana page will be pre-populated with an Index pattern of “logstash-*”, which means it will pick up every logstash-related index in Elastic.
It will also be populated with a default time field. You can override this with one of the timestamps your grok should have extracted, but you’re as well keeping the default.
All you then do is accept these settings. Kibana will then connect to Elastic and return a list of fields stored in the indexes that match the “logstash-*” pattern, which should be all the fields your grok extracted (the blue text from earlier on).
Can I look at my data now?
Yes! Go to Kibana’s Discover page and your data should be there. You can explore to your hearts content and start messing around with filters and visualisations to extract all the hidden value that, without ELK, you would find very difficult to extract.
Enjoy!
If you find any errors in my instructions, or any part of the above is unclear, or you want / need more information, comment below and I promise I will update it.
Helpful pages
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://discuss.elastic.co/t/grok-multiple-match-logstash/27870
https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns