Yesterday the September edition of the Amsterdam.rb Ruby meetup took place at WeWork, with insightful talks by WeTransfer's Julik Tarkhanov and GitHub Developer Advocate Don Goodman Wilson.

It's a TIFF. Adventures in file format

Julik shared a rather new feature with us, image previews on WeTransfer. Enabled by default in most countries, WeTransfer attempts to test the load on their servers. The amount of image transfers over the platform runs in the millions per day, the exact number Julik is not at liberty to say.

IMG-20180918-WA0000
Julik Tarkhanov

At a certain point 'just download' was not enough, says Julik. WeTransfer wants people to be able to present their work and to offer them choices: what to download if you're pressed for time? What we need to know about a file, without downloading it in full, is the file format, the resolution for images, compositions for archives, duration for video/audio and page count for PDFs.

Before loading an image into your image processing library you want to know how large it's going to be. Assuming 8-bit colorchannels and RGBA, an uncompressed buffer for a 512x512 image is 1048576 bytes (width_px * height_px * 4). Indeed, file processing can get ugly. It's CPU-intensive, IO-intensive and relies on a multitude of libraries, and libraries always come with a security risk. We can't download entire files and run detection on it.

Detecting files reliably is hard, but not impossible. WeTransfer considered using FastImage, Dimensions, imagesize or exiftool for file properties detection, before writing their own tool.

format_parser had the following requirements:

  1. Random-access via HTTP using Range: - you don't want to inspect bit per bit / straight-ahead scanning only to find out the format information is at the very end, using a ton of elsif statements - and support for local files,
  2. One isolated module per supported file format - instead of a page-long case statement,
  3. Some protection from crafted malicious payloads.

Julik claims that format_parser is on par with FastImage in terms of speed, and orders of magnitude faster when file information is at the tail of the file. WeTransfer even goes as far as to have job-candidates create a module for format_parser as a type of litmus test.

Let's build a GitHub App

Don is more apt to writing C++ code, but he wrote a Sinatra app for the occasion. Don enjoys 'automating all the things' and to that end created an GitHub app that is essentially an issue triage bot, using NLP (Recast.ai) and Octokit.rb.

donGW
Don Goodman Wilson

GitHub apps allow you to do anything you can do as a user, but as an app. You can use GitHub apps to build custom workflows (bespoke or os), 3rd party integrations, and native apps.

A little while ago I attended a workshop at GitHub's Amsterdam office to learn how to automatically add labels to issues, using Probot and NodeJS. The Ruby app Don showcases works roughly the same, in that it receives the issues event, checks whether action is opened, analyzes the issue title, and assigns a label using the Issues API.

The code for the bot is up on GitHub. Try and install it on a repo, and create a new issue to see the bot working. Or just create an issue on Don's sandbox env. The training material for the bot is 'remixable' as well.

The October 16 Amsterdam.rb meetup will host 'Hubber' and Rails core member Eileen (@eileencodes) and Fotos Georgiadis, Senior Software Engineer at Catawiki.