Selenium Webdriver Exercise – Find Broken Links On a Website

Broken links are bad for any website irrespective of its business function. It could be a blog like ours or a product website like Amazon or Flipkart. For a blog, it could hurt search rankings. Whereas an e-commerce portal may face a risk of losing customers if the link is to a product which no more exist. Hence, it’s inevitable to build a mechanism that can find broken links of any Website. Also, a tester should include as a testing task in his/her test plan if it’s not there already. He needs to make sure the website doesn’t host any broken URLs.

You may find many tools including some open source as well which can do the job for you. But you would miss the fun which you invariably feel by doing it yourself. Also, when you are using Selenium Webdriver, then it becomes even more interesting to find broken links in Java. So, to help you out we are providing the working Java code in the below section. In this code, we are fetching all the links of a website using webdriver commands. And reading their status with the help of <HttpURLConnection> class.

There are few HTTP status codes that you should know. With these status codes, you can mark a link either as a valid or a broken link. For example, if a link returns 200, it means a valid link. The status indicating 404 code suggests the link is not accessible. Similarly, you can check for other status codes such as 400 – bad request, 403 – Forbidden, and 422 – unable to process etc. So, please get any additional information about the HTTP status codes from here.

In the below example to find broken links, we’ve handled the 404 error code to count the total number of bad requests. You are welcome to customize this code to use any other status code.

How to Find Broken Links On a Website?

Just a few thing we like to tell you about the code given here.

1- Configuring Chrome driver for the Sample Program.

We usually give samples that run on Firefox browser by default. But the latest Selenium Jar (2.53) didn’t work well with the latest FF version (47). It probably had some issues which someone already reported.

So, we decided to use the latest Chrome driver and the browser to run this sample program.

2- Using HTTP APIs to Determine URL Status.

While the Webdriver commands brought us all the URLs from the Website. It was the Java HTTP APIs that helped us find broken links with the help of <getResponseCode()> API.

3- Handling Malformed URL Exception.

Another fact that we came to know after we executed the whole code using a test website. It was that sometimes the URL is itself malformed so accessing it via HTTP APIs causes an exception.

Hence, we encircled the status verification part with a <try-catch> layer so that it can absorb the <>. And let the broken link analysis continue.

4- The Complete Java Code to Find Broken Links Of a Website.

5- Sample Code Execution.

You can directly run the program from the Eclipse IDE. Or export the program as runnable Jar and then run the Jar from the command line. So, you can choose any of the two methods.

Next, we are launching a dummy website in the above code. It is http://www.example.code/. You might like to change it to before running the code at your end. Also, below is the result of last execution of the above program.


Final Thought.

Hope the above Java code could help you find broken links in real-time. If you enhance or customize the above code, then please share the result with us. So that we can further educate our audience.

Help us spreading the knowledge to the last man who needs it.