Knowing your customer, without - you know - knowing your customer?!1!

Nobody needs reminding of the GDPR deadline that passed just a few weeks back, nor the contents and ramifications of the new privacy law. This won’t be that kind of post. Instead, I’d like to explore how any small business trying to sell tech products on the internet can still get a feel for who their customers are, without overstepping any legal boundaries.

Or: how can we do BML after GDPR ASAP without going OMG, WTF, IDK?

This is roughly the content of a talk I gave yesterday at the Amsterdam Ruby Meetup.

What are we iterating on?

What peeked my interest in this topic? Phusion released a new product in March this year, free of charge until further notice (or: until we’ve figured out how to monetise and what customer segment we can charge).

In order to validate the product idea we want to figure out how people are using the thing (eg: what pages they were checking out, how often, and what they’d like to see us build next). We’ve implemented telemetry (usage data) to supplement the surveys we asked users to fill out, and the sound bites we extract from customer support.

People generally aren’t jumping to the occasion of spending their limited time on earth filling out surveys, but we actually desperately need this data in order to make better decisions regarding feature implementation. The way we collect data is compliant but not as insightful as it could be.

GDPRs unintended outcome

GDPR requires companies to have legitimate reasons for collecting and using consumer data, along with explicit consent for doing so. The reality of GDPR is that it favors the likes of Facebook, Google, Netflix who can easily gain consent. Businesses like ours, that indirectly serve customers, have more difficulties gaining consent.

Consider this: you’re likely to click “Yes” when a GDPR form is put between you and seeing who commented on your recent selfie. Consumers agree to share “personal info” if they can receive justifiable rewards or gain access to something they think they cannot live without. We value convenience over privacy when we consent to services we depend on for maintaining relationships, navigation and birthday notifications… Google and Facebook offer products to consumers directly, which has built that “last mile” to obtaining user consent. Hell if you know what Google Maps needs access to your contact list for?! They know what they’re doing, amirite?

As a result, Google, Facebook et al will solidify their monopoly in the digital world. GDPR has affected businesses trying to sell something on the internet and iteratively make that thing better.

But I digress. In line with GDPR, for the telemetry we:

  • collect only the data we need,
  • adopt a clear consent process.

... fingers crossed that most users will be OK sharing data that improves services they use.

Model, View, Data Controller

We know very little about Passenger’s customers. I mean, we have a rough idea of our current customer and can assume the purchasing manager shares a desk or department with them. We’re developers selling to IT departments and professionals and can assume we share an interest in software, security and performance.

The telemetry data isn’t much of a crystal ball. All it tracks is the version number of Passenger on instance(s) run, the number of frontend clients, number of servers, number of Passenger instances and number of registered apps.

We don’t know:

  • if the install is by a long-term Passenger customer or user
  • how/where the person found out about the product’s existence
  • how they continue to be informed about its updates
  • we don’t know if they’re Rubyists, Pythonista’s or Javascripters

We’re in the dark about a lot of things. Plus, at this point we only track data from the backend of the system. Our React frontend has no tracking whatsoever, but we’d like to find out which pages are popular, and how people get there.

Combining assumptions with telemetry

In an attempt to measure anything, we would:

  1. Post a blog post on a new feature in Week 1, allowing us to measure the reach of the blog, our Twitter account and selected channels, with Google Analytics tracking a campaign link.
  2. In Week 2 the newsletter goes out to our blog subscribers and enterprise customers. To be super-duper sure we don’t attribute a spike wrongly, we use a different campaign link.

That way we’re able to find out what channels worked (for what particular news item or feature) and can deduce if users who came for the announced feature stick around after the install and continue using it overtime. Or if these users have things in common, like more than x apps running on more than x servers - indicating a user type with a specific problem they try and solve by using our product. Hence market segment; hence cash. Almost.

We can count on people coming from the newsletter being customers and people coming from Twitter, the blog directly, and from the handful of channels we use to distribute our content (most notably The Changelog, Ruby Weekly, Hackernews and Reddit) being largely open source users, with the occasional exception to the rule.

We don’t know if the user of the product is a long-time Passenger customer or user - although we could make the assumption based on number of Passenger instances, where a long term customer might have more (or older) apps running in production. That’s a lot of assumptions to base anything on, but you take what you can get.

So, we could distinguish between an enterprise and an open source funnel and track those separately on the backend - again: the version number of Passenger instance(s) run, the number of frontend clients, number of servers, number of Passenger instances and number of registered apps - and frontend interactions.

More detailed information is all Google Analytics can currently offer, so we might look at countries requests for campaign links come from, and further segment per channel to deduce what channels we need to write more marketing material for. And of course what marketing messages (and features) resonate most.

We'd love to hear from you

GDPR made the Build-Measure-Learn loop a little harder. We can’t ask individual customers for their pain points or ask them for testimonials or case studies. Because we don’t know who they are, remember? We can take from Google Analytics what we can, make assumptions, experiment and we’ll need to get ‘out there’ again to gather feedback. Which isn’t a bad thing per se.

I’d love to hear how you needed to alter your usage data collection and how you make up for the gaps in your insights!