This is part 1 of 3 in the series of technical blog posts by Joakim Recht where he dives into the test-driven UI development that happens at Tradeshift.
At Tradeshift, we have what one of my colleagues has expressed as “Lunch-Driven Development” – basically, all features need to be ready before lunch. Which is of course only partly correct, but what is true is that we’re constantly introducing new functionality and pushing code into production. As usual, this requires a pretty solid test suite, and for the Java code we have good coverage, and we try to do TDD/BDD as much as possible.
The frontend part is somewhat different. As I wrote in an earlier article, the early days of Tradeshift had a not so nice Drupal-based frontend and a Java backend. At some point, the frontend has been replaced with a Grails-based one, which improved a lot of things. However, in one regard the change really didn’t change a lot: UI regressions kept popping up without anybody noticing until a bug report came in, and we didn’t have any structured way of preventing this. We had had some experiments with running tests through Selenium, but those tests were extremely hard to maintain and run, and they broke very often. This is the story of how we improved the quality of our UI, and what we’ve learned from writing and running them.
At some point, Geb came along. I don’t really remember who or how it came into Tradeshift, but what I do remember is my quite deep scepticism. Even though I had seen a very nice presentation at Google I/O about Webdriver, I really hadn’t seen the light yet.
Fast-forward 2 years. We now have a test-suite with growing number of specs (although not nearly as many as we have on the main Java backend), and the specs are run automatically after all pushed to our code repository (a quick note: a spec roughly corresponds to a test, just more descriptive – sorry for the simplification, BDD people). Of course, they have to be green before anything will go into production. There’s been lots of times where they haven’t been green, and this has prevented us many, many times from deploying defective code into our production environment.
Geb and Spock
First of all, a quick introduction to the technologies involved: Selenium, Spock and Geb.
Writing tests against the Selenium API is possible, but it’s cumbersome. Selenium is written in Java, and you have to write quite a lot of boilerplate code to do anything, and most of the time you’re hitting the wrong abstraction level.
It turns out that having a model where you in the test basically navigate between pages and interact with elements is very nice. This means that you in the test framework specify which pages you have and which elements are on the pages, and each test then use these page declarations to interact with the actual page in the browser. This gives a nice abstraction on top of the raw HTML/DOM, so that if you change the location of an element on a page, then it doesn’t break everything – and if it does, then you only have to change the page declaration to fix the problem.
This is exactly what Geb helps you with. Geb is a Groovy-based layer on top of Selenium which makes it very easy to define pages, elements on pages and inspect the contents of a browser. Actually, with Geb you generally don’t care about the browser – you just instruct Geb to navigate to a page, and this will drive a browser to go to a specific URL. You can then check elements on the page – again without caring about the browser (or the type of browser).
This leads us to the final component: Spock. Spock is a framework for writing tests in BDD style, based on Groovy. Tests in Spock are called specifications, or specs, and these make it possible to write understandable and flexible test cases. A Spock spec goes something like this:

So basically, a Spock spec is a list of when/then pairs. It can also contain other stuff, but the basic setup is this. Since Spock is based on Groovy, the when/then blocks actually have meaning: the when block sets up a scenario, the then block makes assertions which must be true (this is implemented using AST transformations, in case you should wonder). In good old JUnit, the then block is any assert* call you make. In Spock, all statements in the then block are asserted automatically, and they must evaluate to true if the spec is to succeed. Finally, Spock just extends normal JUnit, so running a Spock spec is as simple as running any JUnit test, and it plugs into CI/build environments just as easily.
The very nice part about Spock is that it enables Test-Driven development in a very real way. Writing a spec can be done in cooperation with a product owner, a UI designer or any other non-technical person. Of course, there’s a lot to put in to actually finish the spec, but just having the when/then specification helps a lot. We’ll see later how a complete spec could look.
And then back to how we use Geb. Our current setup is actually the 3rd generation UI integration tests:
- 1st generation had Geb specs just running through a couple of our core flows. The specs weren’t hooked into the build environment, so they weren’t really maintained properly.
- 2nd generation had a much wider range of Geb specs running as part of the frontend build. We configured a machine in the office to show the execution live on a screen so that everybody could follow test progress and spot errors. Over time, the specs did become somewhat messy, and it got hard to figure out how to reuse common functionality. Also, as the number of specs grew, the execution time also grew, so build time just went up. On top of that, the process of getting the specs to run in the first place was very fragile.
- 3rd generation is implemented as a separate project, which runs all integration tests – both UI specs and tests validating our API and other integration points. This new project contains a lot of abstractions to enable reuse and get up and running quickly. Since the number of tests has grown to be pretty big, the test execution is parallelized when running from Jenkins. In the following, we’ll take a look at some of the details around our test setup.
Continue reading part 2.