Selenium WebDriver Tutorial: A Comprehensive Guide to WebDriver Automation
Learn about WebDriver, a remote programming interface for controlling a browser locally or remotely, through our detailed tutorial on Selenium WebDriver.
Join the DZone community and get the full member experience.
Join For FreeWebDriver is a simple and concise remote programming interface that can be used to control, or in other words, drive, a browser either locally or on a remote machine. It is a platform-neutral and programming language-neutral wire protocol that can be used to remotely instruct the browser's behavior, like manipulating discovering DOM elements, manipulating DOM elements, controlling the behavior of user agents, etc.
Now part of the Selenium project, the combination of Selenium WebDriver encompasses the language bindings and the implementations of the code that controls individual browsers, which are now often simply referred to as WebDriver. The tool is especially useful in performing browser automation testing across various browsers and operating systems.
The online world has evolved quickly, and with each new application, a higher standard is set for user experience. When it comes to developing websites and web apps, it's important to ensure a seamless end-user experience. That's why automation testing is the best way to test your product across various browser and operating system combinations.
Because it offers support for a wide variety of programming languages, including Java, C#, Ruby, JavaScript, and more, Selenium can be an effective tool for large organizations that wish to automate their software testing process.
This WebDriver tutorial explores what WebDriver is, its features, how it works, best practices, and more.
Let's begin!
What Is WebDriver?
WebDriver is a browser automation technology that drives a browser natively, as a user would, either locally or on a remote machine using the Selenium server.
WebDriver is a program that enables users to control a web browser from another computer. It allows for the introspection and control of user agents and gives a platform- and language-neutral wire protocol as a way for out-of-process programs to instruct the behavior of web browsers remotely.
This specification provides a set of interfaces to discover and manipulate the DOM, focusing on web compatibility. This specification is primarily intended for use in the automated testing of user agents but may also be used in such a way as to allow in-browser scripts to control a browser.
What Is Selenium?
Selenium is an open-source test automation framework that allows web apps to be tested across different browsers & operating systems. It supports compatibility with multiple programming languages such as Java, JavaScript, Python, C#, and more, so testers can automate their website testing in any programming language they are comfortable with.
Selenium framework allows testers to deliver test cycles faster by automating repeated test cases. Selenium integrates seamlessly with CI/CD pipeline and can help with a sturdy, bug-free release deployment pipeline.
What Is Selenium WebDriver?
Selenium WebDriver is both a language binding and an implementation of browser-controlling code. This is commonly referred to as just WebDriver. Selenium WebDriver is a framework that lets you run tests across multiple browsers. It helps you to automate the process of checking whether your web application performs as expected. It allows you to choose from a number of programming languages to create test scripts.
At the time of writing this, Selenium 4, the latest version of Selenium, is the talk of the town since its launch in 2021. Check out this video to know about Selenium 4 Grid architecture, an overview of relative locators, W3C in Selenium WebDriver, and much more.
The Rise of WebDriver Framework
WebDriver is a set of standards used by different browsers. Browsers such as Chrome, Firefox, Edge, and more use these standards to make respective browser drivers such as ChromeDriver and Gecko. The testing community widely uses the WebDriver framework to perform automation testing on web applications and native mobile applications. Wondering why? Because the tests performed on WebDriver are simple and concise. All these reasons have made testers adopt WebDriver to fulfill their browser testing needs.
If you're a developer who's passionate about quality assurance, then this is the right place for you.
Whatever your level of WebDriver skill, this Selenium WebDriver tutorial unleashes the full potential of test automation. This will help you get everything up and running and give you all the information (and code) you need to create powerful test automation solutions.
Why Use Selenium WebDriver?
- Compatibility: Selenium is a long-lived project with a wide range of expected functionality. It has been designed with care to allow existing users of Selenium WebDriver to avoid unexpected breakages.
- Simplicity: This specification is designed to make it easy for automated testing tools to interact with web content. As such, you will find commands that simplify common tasks, such as typing into and clicking elements.
- Extensions: The WebDriver protocol can be extended to add functionality that is not currently part of the ECMAScript standard. This allows all browsers to support the automation of new platform features and allows vendors to expose functionality specific to their browsers.
- Capabilities: WebDriver capabilities describe the features supported by a given implementation. Local endpoints can use capabilities to define which features they require remote endpoints to satisfy when creating a new session. Remote endpoints can use capabilities to describe the full feature set for a session.
WebDriver Nodes
The WebDriver protocol allows for communication between:
- Local End: The local end is the client side of WebDriver's protocol, usually implemented by language-specific libraries that provide an API on top of the wire protocol. Any specifics imposed by this specification do not bind these libraries.
- Remote End: The remote end hosts the server-side portion of the WebDriver protocol. The goal of this specification is to define what a remote end should do in response to messages from the WebDriver protocol.
Remote ends are classified into two broad conformance classes called node types, which are:
- Intermediary Node: An intermediary node implements both the local end of a protocol and its remote end but is not expected to implement the remote end directly. Any nodes between an intermediary node and a local endpoint are said to be downstream of that node. In contrast, any nodes between an intermediary node and a remote endpoint are said to be upstream.
- Endpoint Node: An endpoint node is the final remote end in a chain of nodes that is not an intermediary node. The endpoint node receives input from the user and sends output to the user.
WebDriver Protocol
To communicate with WebDriver, endpoints must provide an HTTP-compliant wire protocol that maps to different commands.
This standard does not constrain how local ends interact with their users. Local ends are only expected to be compatible with the Remote End Protocol; they're not required to expose a user-facing API. WebDriver protocol includes the following:
- Algorithms: This specification is written in terms of algorithms. Steps in these algorithms are not intended to be performed by a human being; they are too detailed and rigorous. Instead, implementors are encouraged to design their systems so that the machine performs these steps automatically.
- Commands: WebDriver is composed of commands defined in this specification. A single HTTP request with a method and template produces a single WebDriver command, which in turn produces a single HTTP response.
- Processing Model: There are two ends of the connection, a client and a server. The server can read requests from the client and send back responses, typically over a TCP socket. This specification covers how these two ends communicate but not how they establish that connection in the first place.
- Routing Requests: Request routing is the series of steps that must be taken to implement a command represented by an HTTP request. WebDriver-defined URLs on a remote end must either have no prefix or be prefixed with the URL prefix associated with that remote end.
- Endpoints:The following table lists each endpoint node command, the method and URI template used for the command, and extension commands: 
  Method URI Template Command Method URI Template Command POST /session New Session DELETE /session/{session id} Delete Session GET /status Status GET /session/{session id}/timeouts Get Timeouts POST /session/{session id}/timeouts Set Timeouts POST /session/{session id}/url Navigate To GET /session/{session id}/url Get Current URL POST /session/{session id}/back Back POST /session/{session id}/forward Forward POST /session/{session id}/refresh Refresh GET /session/{session id}/title Get Title GET /session/{session id}/window Get Window Handle DELETE /session/{session id}/window Close Window POST /session/{session id}/window Switch To Window GET /session/{session id}/window/handles Get Window Handles POST /session/{session id}/window/new New Window POST /session/{session id}/frame Switch To Frame POST /session/{session id}/frame/parent Switch To Parent Frame GET /session/{session id}/window/rect Get Window Rect POST /session/{session id}/window/rect Set Window Rect POST /session/{session id}/window/maximize Maximize Window POST /session/{session id}/window/minimize Minimize Window POST /session/{session id}/window/fullscreen Fullscreen Window GET /session/{session id}/element/active Get Active Element GET /session/{session id}/element/{element id}/shadow Get Element Shadow Root POST /session/{session id}/element Find Element POST /session/{session id}/elements Find Element POST /session/{session id}/element/{element id}/element Find Element From Element POST /session/{session id}/element/{element id}/elements Find Elements From Element POST /session/{session id}/shadow/{shadow id}/element Find Element From Shadow Root POST /session/{session id}/shadow/{shadow id}/elements Find Elements From Shadow Root GET /session/{session id}/element/{element id}/selected Is Element Selected GET /session/{session id}/element/{element id}/attribute/{name} Get Element Attribute GET /session/{session id}/element/{element id}/property/{name} Get Element Property GET /session/{session id}/element/{element id}/css/{property name} Get Element CSS Value GET /session/{session id}/element/{element id}/text Get Element Text GET /session/{session id}/element/{element id}/name Get Element Tag Name GET /session/{session id}/element/{element id}/rect Get Element Rect GET /session/{session id}/element/{element id}/enabled Is Element Enabled GET /session/{session id}/element/{element id}/computedrole Get Computed Role GET /session/{session id}/element/{element id}/computedlabel Get Computed Label POST /session/{session id}/element/{element id}/click Element Click POST /session/{session id}/element/{element id}/clear Element Clear POST /session/{session id}/element/{element id}/value Element Send Keys GET /session/{session id}/source Get Page Source POST /session/{session id}/execute/sync Execute Script POST /session/{session id}/execute/async Execute Async Script GET /session/{session id}/cookie Get All Cookies GET /session/{session id}/cookie/{name} Get Named Cookie POST /session/{session id}/cookie Add Cookie DELETE /session/{session id}/cookie/{name} Delete Cookie DELETE /session/{session id}/cookie Delete All Cookies POST /session/{session id}/actions Perform Actions DELETE /session/{session id}/actions Release Actions POST /session/{session id}/alert/dismiss Dismiss Alert POST /session/{session id}/alert/accept Accept Alert GET /session/{session id}/alert/text Get Alert Text POST /session/{session id}/alert/text Send Alert Text GET /session/{session id}/screenshot Take Screenshot GET /session/{session id}/element/{element id}/screenshot Take Element Screenshot POST /session/{session id}/print Print Page 
- Errors: WebDriver errors are represented by an HTTP response with a status in the 4xx or 5xx range and a JSON body containing details of the error. The JSON response contains a message detailing the specific error and a field named "value" that contains detailed information about the type of error.
- Extensions: Vendors can define additional commands seamlessly, integrating with the standard protocol, making it easier to access platform-specific features. This also allows other web standards to define commands for automating new platform features. Such commands are called extension commands and are not treated differently than other commands. Each has its own HTTP endpoint and remote endpoint steps.
WebDriver Capabilities
WebDriver capabilities are used to communicate what features the implementation supports. The local end can use capabilities to describe the features it requires the remote end to satisfy when creating a new session. Likewise, the remote end can use capabilities to describe its full feature set for a session.
The following table lists the capabilities that each implementation must support. Each implementation may define its extension capabilities.
| Capability | Key | Value Type | Description | 
|---|---|---|---|
| Browser name | "browserName" | string | Identifies the user agent. | 
| Browser version | "browserVersion" | string | Identifies the version of the user agent. | 
| Platform name | "platformName" | string | Identifies the operating system of the endpoint node. | 
| Accept insecure TLS certificates | "acceptInsecureCerts" | boolean | Indicates whether untrusted and self-signed TLS certificates are implicitly trusted on navigation for the duration of the session. | 
| Page load strategy | "pageLoadStrategy" | string | Defines the current session's page load strategy. | 
| Proxy configuration | "proxy" | JSON Object | Defines the current session's proxy configuration. | 
| Window dimensioning/positioning | "setWindowRect" | boolean | Indicates whether the remote end supports all of the resizing and repositioning commands. | 
| Session timeouts | "timeouts" | JSON Object | Describes the timeouts imposed on certain session operations. | 
| Strict file interactability | "strictFileInteractability" | boolean | Defines the current session's strict file interactability. | 
| Unhandled prompt behavior | "unhandledPromptBehavior" | string | Describes the current session's user prompt handler. Defaults to the dismiss and notify state. | 
WebDriver Sessions
A session is a single-user agent instance, including all its child browsers.WebDriver provides each session with a unique identifier that can be used to differentiate one session from another, allowing multiple user agents to be controlled from a single HTTP server and allowing sessions to be routed via a multiplexer (known as an intermediary node).
A WebDriver session is an instance of the connection between a local end and a specific remote end.
New Session
| HTTP Method | URI Template | 
|---|---|
| POST | /session | 
The New Session command creates a new WebDriver session, which attempts to connect to the endpoint node. If the creation fails, the WebDriver client returns an error message.
Delete Session
| HTTP Method | URI Template | 
|---|---|
| DELETE | /session/{session id} | 
The remote end steps are:
- Try to close the active session if the current session is an active one.
- Return success with data null.
Status
| HTTP Method | URI Template | 
|---|---|
| GET | /status | 
The status session returns information about the remote end's ability to create new sessions. But it may additionally include meta information specific to the implementation.
WebDriver Commands
The following table lists some useful and common WebDriver commands and their syntax:
| S.No. | Command and Description | 
|---|---|
| 1. | driver.get("URL"); To navigate to an application. | 
| 2. | element.sendKeys("inputtext"); Enter some text into an input box. | 
| 3. | element.clear(); Clear the contents from the input box. | 
| 4. | select.deselectAll(); Deselect all OPTIONs from the first SELECT on the page. | 
| 5. | select.selectByVisibleText("some text"); Select the OPTION with the input specified by the user. | 
| 6. | driver.switchTo().window("windowName"); Move the focus from one window to another. | 
| 7. | driver.switchTo().frame("frameName"); Swing from frame to frame. | 
| 8. | driver.switchTo().alert(); Helps in handling alerts. | 
| 9. | driver.navigate().to("URL"); Navigate to the URL. | 
| 10. | driver.navigate().forward(); To navigate forward. | 
| 11. | driver.navigate().back(); To navigate back. | 
| 12. | driver.close(); Closes the current browser associated with the driver. | 
| 13. | driver.quit(); Quits the driver and closes all the associated windows of that driver. | 
| 14. | driver.refresh(); Refreshes the current page. | 
WebDriver Screen Capture
Screenshots are a great way to provide visual diagnostic information. Screenshots take a snapshot of the initial viewport's frame buffer as a lossless PNG image and return it to the local end as a Base64 encoded string.
The WebDriver's Take Screenshot command captures the top-level browsing context's initial viewport, and the Take Element Screenshot command allows you to capture an element's visible region after it has been scrolled into view.
Take Screenshot
| HTTP Method | URI Template | 
|---|---|
| GET | /session/{session id}/screenshot | 
Remote end steps are:
- If the current top-level browsing context has been closed, return an error with the error code no such window.
- When the user agent decides to run the next animation frame callbacks.
- Return success with a string containing encoded data.
Take Element Screenshot
| HTTP Method | URI Template | 
|---|---|
| GET | /session/{session id}/element/{element id}/screenshot | 
Remote end steps are:
- If the current top-level browsing context has been closed, return an error with the error code no such window.
- Handle any user prompts and return an error if it is a prompt that cannot be read.
- Let the element be the result of trying to get a known element with an id attribute equal to the url variable's value.
- Click the element to scroll it into view.
- When the user agent is next scheduled to run the animation frame callbacks.
- Return success with a string containing encoded data.
Selenium WebDriver Architecture
Selenium WebDriver allows us to create cross-browser tests using a programming language of our choice.
The Selenium WebDriver architecture consists of four major components:
- Selenium Client Libraries/Language Bindings
- JSON Wire Protocol
- Browser Drivers
- Real Browsers
Selenium Client Libraries
Selenium developers have built language bindings to support the use of the program in multiple languages. For example, if you are writing your tests in Java, you can use the Java bindings. Client libraries can be downloaded from the official Selenium website.
JSON Wire Protocol
JSON (JavaScript Object Notation) is a data-interchange format that makes it easier to read and write data between server and client. It supports data structures like objects and arrays, which makes it easier to transfer data between clients and servers.

Browsers Drivers
To develop a secure connection with the browser, Selenium uses Drivers. Each driver is specific to each browser and is responsible for handling all the logic that makes up that particular browser. In addition, each automation language has its own corresponding driver. Each of the following series of actions occurs when a Selenium automation test is triggered:
- Each Selenium command results in an HTTP request, which is sent to the browser driver.
- This request is routed through the HTTP Server.
- The HTTP server now drives the execution of instructions on the browser.
- The browser sends a status message to the HTTP server, which forwards it to the automation script.
ChromeDriver, GeckoDriver, MicrosoftEdge driver, etc., are some browser drivers.
Browsers
Browsers act as the endpoint of our test executions. Here are the supported browsers:
- Chrome
- Firefox
- Safari
- Edge
- Internet Explorer
Features of Selenium WebDriver
Selenium is a widely used open-source testing framework that comes with a lot of features:
- Open Source: The first significant feature of Selenium WebDriver is its open source. It provides all the features of QTP and more, completely free of charge. You can download it directly from the official site; support is also available because the tool is open-source.
- Language Support: WebDriver's support for multiple programming languages is one of its top benefits. WebDriver supports nearly all programming languages developers should know, including Python, PHP, Java, C#, Ruby, and JavaScript. It also offers bindings to every primary programing language. This flexibility gives web developers the freedom to work in whichever language they are most comfortable with.
- Multiple OS Support: Unlike previous releases of Selenium WebDriver, one of the major benefits of this version is that it supports multiple operating systems, such as Linux, UNIX, and Mac, as well as Windows. You can create a customized testing suite that can be used over any platform using its solution suite. WebDriver allows you to create a test case on Windows and execute it on Mac.
- Cross-Browser Compatibility Testing: Selenium WebDriver, unlike its predecessors, has expanded its support for cross-browser automation. This tool supports all major browsers, including Chrome, Firefox, Safari, Opera, IE, Edge, Yandex, etc. When you execute cross-browser testing of a website, WebDriver provides you with an automated solution.
- Multiple Language and Framework Support: WebDriver can be integrated with various frameworks like Maven or ANT for compiling the source code. To ease automation testing, WebDriver can also be integrated with testing frameworks like TestNG. In addition, it can integrate with Jenkins for Continuous Integration or Continuous Delivery of automated build and deployment.
- Cross-Device Testing: The ability to write automated test cases for multiple devices is a major benefit of automation testing with Selenium WebDriver. A developer can now write an automated test case that will run on iPhones, Blackberrys, and Android devices, thereby helping in addressing cross-device issues.
- Community Support: Selenium WebDriver's support is community-based, which enables regular upgrades and updates. These updates are available whenever required, and no special training is needed to access them. This makes Selenium WebDriver both budget-friendly and resourceful.
- Easy to Implement: Selenium WebDriver's user-friendliness is one of many widely acclaimed features of the tool for automation testing. Being open source, Selenium WebDriver allows users to script their personalized extensions in order to develop customized actions that can be manipulated once they reach an advanced level.
- Add-ons and Reusability: With Selenium WebDriver's browser compatibility testing capabilities, it is possible for a tester to run multiple testing scenarios with WebDriver since it covers every functionality testing aspect of the application. The add-ons, which can be customized, offer great benefits for automation testing with WebDriver.
- Open Source: Performance and Speed: The automation tool WebDriver is part of Selenium, a software program for web app testing. WebDriver can execute test cases quicker than other comparable tools because it communicates directly with the browser and has no requirement for intermediaries like servers.
- Dynamic Web Elements: Selenium is capable of handling dynamic web elements with ease. The following are a few methods that it makes use of for this purpose: 
  - Contains(): You can use partial text to find an element.
- Absolute XPath(): The XPath selector language can easily handle dynamic web elements, even those that do not have a fixed location on the page. XPath comes with a complete set of paths for web UI automation right from the root node.
- StartsWith(): The function helps you find a specific attribute of a dynamic web element, such as an ID or class name, by looking at the beginning of the HTML tag.
 
- WebDriver: Classes and Methods: Selenium WebDriver is one of the most important tools offered by Selenium. It provides solutions for potential problems in automation testing, such as dynamic locators for complex web elements like radio buttons, dropdowns, alerts, and more.
- Combination of Tool and DSL: Selenium is a combination of tools and DSL (Domain Specific Language) that allows you to carry out various types of tests through the browser. It allows you to record the tests carried out in different browsers like Internet Explorer, Safari, Firefox, Chrome, and more.
- Easy Identification and Use of Web Elements: Selenium's robust set of locators make it easy to identify web elements on web apps, making implementing these elements much easier in your test automation suite.
- Mouse Cursor and Keyboard Simulation: The WebDriver enables you to mimic a real user scenario by handling keyboard and mouse events. It consists of action classes, which allow you to automate simple scenarios such as the keypress event, and mouse click or complex ones like multiple item selection, drag and drop, click and hold, etc.
WebDriver Limitations
WebDriver is a feature-rich framework among the tester community. But, it also has some limitations of its own:
- Selenium is designed to automate web applications, not desktop applications. Desktop automation requires a different type of automation tool that's designed for that purpose.
- Selenium requires extensive knowledge and experience to automate tests effectively.
- Because Selenium is open-source software, users must rely on community forums to get technical issues resolved.
- Automation tests for web services like SOAP and REST cannot be performed using Selenium.
- The Selenium WebDriver framework has no built-in reporting capabilities; thus, users must rely on plug-ins like JUnit and TestNG to generate test reports.
- It is not possible to automate image testing with Selenium alone. Sikuli, an open-source visual automation program, must be integrated into the Selenium framework to test images.
- Selenium is a maintenance-heavy framework, and as the product grows, scaling becomes more difficult.
- Test environments in Selenium take more time to set up than vendor tools like UFT, RFT, and Silk Test.
- Due to a lack of tool integration, Selenium does not provide support for test management.
WebDriver Best Practices
Here are some of the best practices of Selenium WebDriver to make your life easier:
- Avoid Blocking Sleep Calls: The behavior of web applications (or websites) depends on many factors that can range from network speed to device capabilities, access location, and load on the back-end server. These factors make it challenging to predict the actual time it takes to load a specific web element. In single-threaded applications, a delay achieved using a blocking sleep call will block the thread and effectively shut down the process.
- Set Naming Conventions: Using standard naming conventions for file types speeds up development and Selenium testing. When some tests fail during the execution stage, it's easy to figure out which functionalities are broken by just taking a quick look at the test name.
- Implement Logging and Reporting: Logging can be a huge savior when locating failing test cases. When a particular test in an extensive test suite fails, it can help you pinpoint the problem. Console logs at appropriate places in the code can help develop a better understanding of the code and zero in on potential problems.
- Using Design Patterns and Principles: When writing Selenium test scripts, one should keep in mind the script's maintainability and scalability. This is possible if changes to a web page's UI do not require changes to the test script. If poorly maintained, this could lead to different scripts using the same web element. Page Objects, a popular web UI automation pattern, can be used to improve test maintenance and reduce code duplication. Page Objects are useful because they allow you to centralize your object repository in one place: the page object class. Hence, every web page being tested will have its corresponding page object class.
- Browser Compatibility Matrix for Cross-Browser Testing: Browser Matrix is a vital resource that combines information drawn from product analytics, geolocation, and other detailed insights about audience usage patterns, stats counter, and competitor analysis. Browser Matrix will reduce the development and testing efforts by helping you cover all the relevant browsers (that matter to your product). Here is a sample Browser Compatibility Matrix:

Conclusions
Selenium is today's web developers' top choice when choosing an automation testing tool. It has been loved by testers and developers alike worldwide. Through this extensive Selenium WebDriver tutorial, we hope to answer every question you have regarding WebDriver testing.
Happy Testing!
Published at DZone with permission of Harshit Paul, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
 
                
Comments