Ben Koller

Monitoring.

Already a while back I got introduced to my current employer: Talentry. They, and thus me, try to make employee recommendation the #1 recruiting source. It’s a mission easy to align with as I strongly believe in it. Luckily, so do other companies, as the customer base proves. In the past months growth fell upon Talentry, and the platform’s infrastructure became due to an overhaul. This is where I step in. Along the way I’ll shed some light upon projects and organisational developments. Parts might be left out, parts will be intentionally vague, nonetheless the lessons learned will prevail.

Monitoring

“Knowledge is power.” - Sir Francis Bacon, “Meditationes Sacrae”

It’s impossible to operate a successful platform in our current day and age without solid knowledge about key performance metrics. These include financial data, analytics about user engagement and in-app behaviour as well as marketing performance data. Crucial to my work however is solid infrastructure monitoring, some of which can be built in-app, some of which can be handled by tools in the likes of htop, ncdu or mytop.

Tools, tools, tools

Sooner than later these tools reach their breaking points as they fail to correlate data from various sources over time. The OpenSource ecosystem provides you with great tooling to do monitoring properly. Nagios with its notoriously steep learning curve comes to mind, as well as self-built solutions around grafana. The Cloud Native Computing Foundation open sourced Prometheus, one of the most interesting production-ready tools on the market right now. However, in a scale of roughly 20 employees total, only three of which engineers, you need to be selective about the tools you choose to operate yourself. Monitoring, in my reason, is none of them.

Datadog

This is where Datadog steps up to the plate. The first five hosts are free and it’s filled to the brim with integrations for all the shiny tools you fancy (Apache, MySQL, Postgres, Docker, Elasticsearch to name a few). Even better, it sports a solid API for events and metrics at no extra charge, clearing the way for both in-app metrics as well as custom or even third-party integrations. Having the benefit of running a fairly common tech stack not an awful lot of custom integrations were necessary to get up and running quickly. However, I fancy thorough pagespeed monitoring. With Datadog, a free Pingdom integration is available, but I was looking for even more insights.

Page speed monitoring

Getting reliable insights into your app’s performance requires you to use a real browser engine. To avoid the fallacy of premature optimization I’ll focus on webkit, conveniently made available by PhantomJS. On top of that sits phantomas, a library exposing PhantomJS metrics as modules for ease of use. The attached code is a redacted excerpt from a function I hacked together to continuously monitor key performance metrics and forward them to Datadog via the dogapi library. With only slight adjustments you can run this at AWS Lambda, but I prefer to run it inside a container. Adjust this to monitor your core user processes, or simply hammer away at your landing page.

function getStats(){
    // init Datadog
    var dogapiKey = process.env.DDOG_KEY
    var dogapiApp = process.env.DDOG_APP
    var dogapi = require("dogapi");
    var options = {
     api_key: dogapiKey,
     app_key: dogapiApp,
    };
    dogapi.initialize(options);
    // Derive metrics and URL from env
    var envMetrics = process.env.METRICS.split(";")
    var URL = process.env.URL

    var phantomas = require('phantomas');
    phantomas(URL, {"analyze-css": false}, function(err, json, results) {
        if(!err) {
            envMetrics.map((metric) => {
                var value = results.getMetric(metric);
                var ddMetric = "psm." + metric + "." + region
                dogapi.metric.send(ddMetric, value, ["tags:"+region], function(err, results){
                    if(err){
                        callback(err)
                        return
                    }
                });
            })
            console.log("success")
            return
        } else {
            console.log(err)
            return
        }
    });
     
}

Finishing it up

Being able to correlate pagespeed with backend metrics from MySQL or Apache provides great insights in performance inefficiencies and gives a lot of confidence in continuous deployments. Infrastructure decisions can be based on precise data, and developers can examine impact of changes in real time. After all, without monitoring you’re effectively blind.

Imprint