History of Selenium

With web applications becoming the de facto approach to developing end user applications, a solution to test is needed. This means more and more emphasis is needed on a browser automation framework to help with checking the site.

For years, people have been using Selenium IDE and Selenium RC to drive a number of different types of browsers. Selenium, when originally created by Jason Huggins, solved the issue of getting the browser to do user interactions.

This is a good automation framework; however, it is limited by the JavaScript sandbox in browsers. The JavaScript sandbox enforces security policies while JavaScript is executing to prevent malicious code executing on the client machine. The main security policy people come across is the Same Origin Policy. If you need to move from HTTP to HTTPS, like you normally would during a log on process, the browser would block the action because we are no longer in the same origin. This was quite infuriating for an average developer!

The Selenium API was originally designed to work from within the server. The developer or tester writing the tests had to do so in HTML using a three column design based on the FIT. You can see how this looks if you open up Selenium IDE: notice the three input boxes that need to be completed for each line that will be executed. Patrick Lightbody and Paul Hammant thought that there must be a better way to drive their tests and in a way that they could use their favorite development language. They created Selenium Remote Control using Java as a web server that would proxy traffic. It would inject Selenium onto the page and then it would be used in a similar way as to the three column manner. This also creates a procedural development style.

The Selenium RC API for the programming languages that are supported has been designed to fit the original three column syntax. Commonly known as Selenese, it has grown over the life of the project to support the changes that have been happening to web applications. This has had the unfortunate consequence that the API has grown organically so that users can manipulate the browser the way they intend, but still keep to the original three column syntax. There are around 140 methods available which makes picking the right method for the job rather difficult.

With the move to mobile devices and HTML5, Selenium RC was starting to show that it wasn't able to fulfill its original requirement: browser automation to mimic what the user is doing.

Simon Stewart, having hit a number of these issues, wanted to try a different approach to driving the browser. While working for ThoughtWorks, he started working on the WebDriver project. It started originally as a way to drive HTMLUnit and Internet Explorer, but having learnt lessons from Selenium RC, Simon was able to design the API to fit in with the way most developers think. Developers have been doing object-orientated development for a while, so moving away from the procedural style of Selenium RC was a welcome change for them. For those interested, I suggest reading Simon Stewart's article on Selenium design at http://www.aosabook.org/en/selenium.html.

The next section will go through the basic architecture of WebDriver.