Collecting usage statistics with DynamoDB and AWS Lambda
Phusion recently launched a new product (Fuse Panel command center for Passenger) and to make educated decisions about what functionality to add next, we worked on simple usage statistics (or: telemetry) adding context to survey response. Fuse Panel telemetry is opt-out and completely anonymous.
In a nutshell: information about the Fuse Panel install is queried periodically and sent to AWS Lambda and consequently saved to DynamoDB. In between we’ve added an API Gateway (with lambda proxy integration) on a custom domain, to be less dependent on Amazon as a service provider in the future.
The Fuse Panel telemetry includes:
- the version number
- number of frontend clients
- number of servers
- number of Passenger instances
- number of registered apps
Privacy / data protection
Fuse Panel periodically tracks usage data and sends this to us. Every interval (currently every 4 hours) a scheduled task runs that collects data about the Fuse Panel at that moment. We only collect snapshot data. Usage tracking can be disabled with the --disable-usage-statistics
command line option.
The first time the panel is started with usage tracking enabled, a unique key will be generated and stored in the local database. This key is sent along with every set of usage data, it can be used to uniquely identify a Fuse Panel installation. It is not possible to tell which specific install, or who is using it.
We currently only track data that is available from within the Fuse Panel backend. In the spirit of our company’s core values we are determined to make it easy to opt-out of usage tracking. The fact that the backend is started from the command line, allowed us to simply disable usage tracking in the entire backend at the flick of a switch (or in this case: a command line option).
How does it work?
The Fuse Panel uses aggregators to aggregate the data from the connected Passenger instances, this allows us to have a single dataset that can be distributed among the connected webclients. Data for usage tracking is collected by querying aggregators and the websocket server. On startup three aggregators are created. One for determining the number of connected servers; one for determining the number of connected Passenger instances; and one for determining the number of registered apps. The server itself keeps track of the number of connected websocket clients. Both the frontend and the Passenger instances connect via websockets. From this information the number of connected Passenger instances can be inferred.
The collected data is POST'ed to an AWS Lambda function through an API Gateway1. The lambda function stores the data it receives in a DynamoDB database, also on AWS. To be less dependent on AWS in the future, the API is located on a custom domain name. This allows us to move to a different solution, while not having to change the endpoint location.
The custom domain name2 is configured in the AWS API Gateway settings and uses an AWS issued certificate, obtained through the AWS Certificate Manager3. To validate the domain, a DNS entry was created.
Why opt for Amazon?
If we don’t want to be dependent on Amazon in the future, why did we put all our eggs in the Amazon-basket? There are plenty of ways to process and store usage statistics, it’s not exactly rocket science. A simple Rails API with any kind of database would have sufficed. And, to be honest, it would have made things a lot easier.
One of the questions we got after our short report on the phusionpassenger.com migration was, why we didn’t opt for new(er) technology like Docker Swarm or Kubernetes. We found out the hard way that we provided some critical infrastructure there and couldn’t experiment too much. For the usage tracking however, anything goes. This gave us the opportunity to play around with new technologies and platforms.
So, while telemetry could just as easily have been implemented in Firebase or in a Google Cloud Function with BigTable storage, the choice for Amazon was not Backed by Science™. There are a lot of parts involved, but it was fun getting to know some of the many, many, many services AWS offers.
What we did not tackle (yet)
Another thing that would be interesting for Fuse Panel development is tracking frontend interactions. This however requires us to hook into ReactJS and some of its components. We haven’t yet taken the time to explore this option in-depth.
We’re looking forward to supplement user feedback (through the Fuse Panel survey and customer support queries) with usage data and make better decisions regarding feature implementation for the Fuse Panel.
1. [https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-create-api-as-simple-proxy-for-lambda.html]↩
2. [https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-custom-domains.html]↩
3. [https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-custom-domains-prerequisites.html]↩
Photo by Nick Karvounis