Using Plino over Akismet for spam filtering

We've been using Frontapp (Front) at Phusion for quite some time now. Even though Front supports multiple communication channels, we mostly use it to handle email communication. Front is our shared inbox for everything related to sales, account management, support, general inquiries, etc etc. It allows multiple Phusioneers to monitor the same inboxes. So far, so good.

Dealing with email inevitably means dealing with spam.

Front does offer a "Mark as spam" option, which will delete the message (move it to trash) and put the email address of the sender on a blacklist. Emails from blacklisted addresses will automatically be moved to the trash. For spammers that are foolish enough to use the same email address for all their campaigns, this solution would be sufficient. However, spammers randomize their From: addresses, rendering the Front solution practically useless.

In comes the webhook

Which is why we built our own spamfilter for Front. Applying advanced machine learning techniques, or even a "simple" Bayesian filter would obviously be the coolest solution. Due to time constraints we were forced to go with the way less cooler option of building a proxy between Front and an external service.

The idea we came up with was to use the 'Rules' functionality in Front to trigger a webhook for incoming messages. Every time we receive a new email in Front, the webhook will be triggered. Our proxy will then retrieve the message contents from Front using their API and contact the external spam service to do the classification. If the message is considered spam, the proxy will tell Front to move the message to a designated spam folder.
spam-filter-request-flow-1

Picking a SaaS

Finding a suitable SaaS (Spamfilter as a Service) was no easy task. There are only a handful of these kinds of services. We didn't have that many requirements for the service. It should have an API that can easily be accessed from Ruby, and be relatively cheap. If the API offered methods to "train" the algorithm, by reporting false positives or false negatives, that'd be a nice-to-have.

DatumBox

One of the first services we went with was DatumBox. Their Machine Learning API offers more functionality than just spam detection. There is a DatumBox wrapper for Ruby available on GitHub that made it easy to talk to the API. Unfortunately you can't train the algorithm through the API. While we could run our own instance and train it, most of its code is written in Java, rendering it less appealing for us to do so.

Nevertheless, we hooked our proxy up to Front and to Datumbox and ran some tests. We were not really happy with the results and decided to try an alternative.

Akismet

If you wanna spamfilter, who you gonna call? Akismet of course. The undisputed king of WordPress spam filtering also offers an API. There are even some offical and unoffical gems to interact with their API. As an added bonus, they have API methods that allow you to mark messages as spam or ham. There are some costs involved if you want to use it in a commercial environment, but $5 a month is a reasonable expense.

We replaced the DatumBox integration with Akismet and continued testing. There were still a fair amount of incorrectly classified messages, but we hoped that by training the algorithm this would improve over time. When we failed to see significant improvement during the two weeks of testing we decided to look for an alternative.

Plino

And so we arrive at our current vendor: Plino. Plino doesn't have a fancy machine learning algorithm, but uses the Enron Dataset to train their custom Naive Bayes classifier. Just like DatumBox, the classifier is Open Source. It's written in Python, which is a big plus for us. We're still running tests, and so far it outperforms DatumBox and Akismet.

What about Google?

We would have loved to have been able to use the Gmail spamfilter, which we consider to be one of the best in the world. Unfortunately in our current setup we can't use the Gmail spamfilter to filter spam before it reaches Front. Google also doesn't seem to offer an API or some other service to make their spamfilter accessible.

You can connect Front to Gmail, but this seems to work only with (personal) inboxes, i.e. inboxes for users in your Gsuite organization. In our Gsuite configuration we don't have seperate users for shared inboxes. There is no 'sales'-user for instance. The pricing of Gsuite users is per year, making it too expensive for us to consider this. Aside from a better spamfilter the benefits are marginal.

Prioritizing spam

Of course, there is nothing more fun than a little sideproject where you can tinker with, and possibly improve, the tools you're already using. With Plino we can even consider building our own Spamfilter as a Service, instead of relying on external services. However, it would be even better if Front implemented this themselves.

Thankfully they already have a feature request for this on their roadmap. I invite everybody to take a look at their roadmap and let's spam them with votes!

In comes the webhook

Picking a SaaS

DatumBox

Akismet

Plino

What about Google?

Prioritizing spam

Supercharge your Ruby, Node.js and Python apps

Passenger 6

Discuss on HackerNews

Discuss on Twitter

Stay in the event loop

Products

About

Contact

In comes the webhook

Picking a SaaS

DatumBox

Akismet

Plino

What about Google?

Prioritizing spam

Supercharge your Ruby, Node.js and Python apps

Passenger 6

Discuss on HackerNews

Discuss on Twitter

Stay in the event loop

Related articles

Products

About

Contact