Selenium Webdriver Exercise – Find Broken Links On a Website

Broken links are bad for any website irrespective of its business function. It could be a blog like ours or a product website like Amazon or Flipkart. For a blog, it could hurt search rankings. Whereas an e-commerce portal may face a risk of losing customers if the link is to a product which no more exist. Hence, it’s inevitable to build a mechanism that can find broken links of any Website. Also, a tester should include as a testing task in his/her test plan if it’s not there already. He needs to make sure the website doesn’t host any broken URLs.

You may find many tools including some open source as well which can do the job for you. But you would miss the fun which you invariably feel by doing it yourself. Also, when you are using Selenium Webdriver, then it becomes even more interesting to find broken links in Java. So, to help you out we are providing the working Java code in the below section. In this code, we are fetching all the links of a website using webdriver commands. And reading their status with the help of <HttpURLConnection> class.

There are few HTTP status codes that you should know. With these status codes, you can mark a link either as a valid or a broken link. For example, if a link returns 200, it means a valid link. The status indicating 404 code suggests the link is not accessible. Similarly, you can check for other status codes such as 400 – bad request, 403 – Forbidden, and 422 – unable to process etc. So, please get any additional information about the HTTP status codes from here.

In the below example to find broken links, we’ve handled the 404 error code to count the total number of bad requests. You are welcome to customize this code to use any other status code.

How to Find Broken Links On a Website?

Just a few thing we like to tell you about the code given here.

1- Configuring Chrome driver for the Sample Program.

We usually give samples that run on Firefox browser by default. But the latest Selenium Jar (2.53) didn’t work well with the latest FF version (47). It probably had some issues which someone already reported.

So, we decided to use the latest Chrome driver and the browser to run this sample program.

 // Setting up Chrome driver path.
 System.setProperty("webdriver.chrome.driver", driverPath + "chromedriver.exe");
 // Launching Chrome browser.
 WebDriver driver = new ChromeDriver();
 // Enter Url.
 driver.get("http://www.example.com");

2- Using HTTP APIs to Determine URL Status.

While the Webdriver commands brought us all the URLs from the Website. It was the Java HTTP APIs that helped us find broken links with the help of <getResponseCode()> API.

 URL link = new URL(urlString);
 HttpURLConnection hConn = null;
 hConn = (HttpURLConnection) link.openConnection();
 hConn.setRequestMethod("GET");
 hConn.connect();
 status = hConn.getResponseCode();

 ...

 if (statusCode == 404) {
    ++brokenLinks;
 } else {
    ++validLinks;
 }

3- Handling Malformed URL Exception.

Another fact that we came to know after we executed the whole code using a test website. It was that sometimes the URL is itself malformed so accessing it via HTTP APIs causes an exception.

java.net.MalformedURLException
	at java.net.URL.<init>(URL.java:627)
	at java.net.URL.<init>(URL.java:490)
	at java.net.URL.<init>(URL.java:439)
	at FindBrokenLinks.verifyURLStatus(FindBrokenLinks.java:57)
	at FindBrokenLinks.main(FindBrokenLinks.java:34)
Caused by: java.lang.NullPointerException
	at java.net.URL.<init>(URL.java:532)

Hence, we encircled the status verification part with a <try-catch> layer so that it can absorb the <java.net.MalformedURLException>. And let the broken link analysis continue.

try {
	URL link = new URL(urlString);
	HttpURLConnection hConn = null;
	hConn = (HttpURLConnection) link.openConnection();
	hConn.setRequestMethod("GET");
	hConn.connect();
	status = hConn.getResponseCode();

} catch (IOException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
}

4- The Complete Java Code to Find Broken Links Of a Website.

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class FindBrokenLinks {
	public static int brokenLinks;
	public static int validLinks;
	public static String driverPath = "C:\\workspace\\tools\\selenium\\";

	public static void main(String[] args) throws IOException {

		// Setting up Chrome driver path.
		System.setProperty("webdriver.chrome.driver", driverPath + "chromedriver.exe");
		// Launching Chrome browser.
		WebDriver driver = new ChromeDriver();
		// Enter Url.
		driver.get("http://www.example.com");
		// Get all the links url.
		List<WebElement> allURLs = driver.findElements(By.tagName("a"));
		System.out.println("size:" + allURLs.size());

		// The below code will find broken links and check their status.
		//
		validLinks = brokenLinks = 0;
		for (int iter = 0; iter < allURLs.size(); iter++) {
			System.out.println(allURLs.get(iter).getAttribute("href"));
			int statusCode = 0;
			try {
				statusCode = verifyURLStatus(allURLs.get(iter).getAttribute("href"));
			} catch (Exception e) {
				e.printStackTrace();
			}
			if (statusCode == 404) {
				++brokenLinks;
			} else {
				++validLinks;
			}
		}

		System.out.println("Total broken links found# " + brokenLinks);
		System.out.println("Total valid links found#" + validLinks);

	}

	// The below function verifies any broken links and return the server
	// status. It eats up any malformed URL exception and send 404.
	//
	public static int verifyURLStatus(String urlString) {

		int status = 404;
		try {
			URL link = new URL(urlString);
			HttpURLConnection hConn = null;
			hConn = (HttpURLConnection) link.openConnection();
			hConn.setRequestMethod("GET");
			hConn.connect();
			status = hConn.getResponseCode();

		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

		return status;
	}

}

5- Sample Code Execution.

You can directly run the program from the Eclipse IDE. Or export the program as runnable Jar and then run the Jar from the command line. So, you can choose any of the two methods.

Next, we are launching a dummy website in the above code. It is http://www.example.code/. You might like to change it to before running the code at your end. Also, below is the result of last execution of the above program.

Total broken links found# 4
Total valid links found#79

 

Final Thought.

Hope the above Java code could help you find broken links in real-time. If you enhance or customize the above code, then please share the result with us. So that we can further educate our audience.

Help us spreading the knowledge to the last man who needs it.

 

Best,

TechBeamers.