Whilst quickly approaching the GDPR deadline (check out our journey through GDPR compliance), we are also working on a new product which is a Rails API-only app with a React frontend. For user interface and integration testing we decided to use RSpec, Capybara, Selenium and Chrome Headless. Previously we used Capybara-Webkit, but Chrome Headless seems to be getting all the momentum nowadays. The PhantomJS maintainer even decided to step down in favor of Chrome Headless. Selenium + Chrome Headless is also the default integration testing setup for Rails 5.2.

The mystery begins

The tests work well on our Macs. But we ran into problems when trying to run our integration tests inside GitLab CI in combination with the Docker executor. We were getting weird errors such as these:

  1) the login process the front page shows a login form
     Failure/Error: visit '/'
     
     Net::ReadTimeout:
       Net::ReadTimeout
     # ./spec/features/login/login_spec.rb:10:in `block (2 levels) in <main>'
     # ./spec/rails_helper.rb:67:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:66:in `block (2 levels) in <top (required)>'

  2) the login process login works
     Failure/Error: visit '/'
     
     Selenium::WebDriver::Error::UnknownError:
       unknown error: Chrome failed to start: exited abnormally
         (Driver info: chromedriver=2.38.552522 (437e6fbedfa8762dec75e2c5b3ddb86763dc9dcb),platform=Linux 4.4.0-121-generic x86_64)
     # ./spec/features/login/login_spec.rb:17:in `block (2 levels) in <main>'
     # ./spec/rails_helper.rb:67:in `block (3 levels) in <top (required)>'
     # ./spec/rails_helper.rb:66:in `block (2 levels) in <top (required)>'

Googling for this problem did not yield useful results. There have been reports from people with similar error messages, going back to early 2017, but nobody has figured out the cause of these errors. Some people claimed that the error is ephemeral, and they "solved" it by simply retrying. Our errors are not ephemeral so we cannot use this strategy.

Identifying possible culprits: Linux and Docker

Since the tests work well on our Macs, we hypothesized that the errors are related to either Linux or the use of Docker. First we tried to run the test on a Linux server over SSH, and... it worked. That leaves only Docker as the possible culprit.

We spun up a ruby:2.5 container, installed Chrome and other dependencies in there, and observed that the tests fail the same way. So something about Docker is making Chrome Headless fail, but what?

Googling for "docker headless chrome selenium" yielded a bunch of results that suggest that Chrome should be run through Xfvb, but doing that did not solve our problem.

Diving deeper

Out of curiosity, we tried playing around with Chrome. According to the Chrome Headless website, we should be able to run a command to take a screenshot. So we ran the following under a normal user account in the Docker container:

docker$ google-chrome --headless --disable-gpu --screenshot https://www.chromestatus.com/
Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted
Failed to generate minidump.Illegal instruction (core dumped)

Not good. What happens if we run it as root instead?

# google-chrome --headless --disable-gpu --screenshot https://www.chromestatus.com/
[0524/140735.373325:ERROR:zygote_host_impl_linux.cc(88)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.

Take a look at what the error messages say. "Failed to move to new namespace" suggests that Chrome was trying to use the Linux namespace features. The fact that it fails is not surprising because we're running in a Docker container which by default does not allow such operations. A Docker container is already in a namespace.

Why would Chrome use Linux namespace features? Most likely for sandboxing, as we can see in the root error message. So what happens if we pass --no-sandbox even when running Chrome as a normal user?

docker$ google-chrome --headless --disable-gpu --screenshot --no-sandbox https://www.chromestatus.com/
...
[0524/140952.392278:INFO:headless_shell.cc(586)] Written to file screenshot.png.

Success! It appears that --no-sandbox did the trick.

Integrating the fix

So we need to somehow tell Capybara and Selenium to run Chrome with --no-sandbox. This is what we inserted in our RSpec initialization file:

Capybara.register_driver :selenium_chrome_headless_docker_friendly do |app|
  Capybara::Selenium::Driver.load_selenium
  browser_options = ::Selenium::WebDriver::Chrome::Options.new
  browser_options.args << '--headless'
  browser_options.args << '--disable-gpu'
  # Sandbox cannot be used inside unprivileged Docker container
  browser_options.args << '--no-sandbox'
  Capybara::Selenium::Driver.new(app, browser: :chrome, options: browser_options)
end

Capybara.javascript_driver = :selenium_chrome_headless_docker_friendly

After this, our GitLab CI job succeeded.

Good luck and happy devving.