Check your website for broken links with jQuery

Broken links happens, but it's something you can check easily without any server side requirement. Just Javascript!

Posted August 10th, 2012 in javascript, jquery, bootstrap, seo, status.js

Writing my notes I make mistakes. I know, my English has room for improvement, but I do other mistakes like broken links, duplicate titles, empty meta descriptions...

A wise person would tell me: "hey dumbass, just download Xenu's Link Sleuth and let it handle the dirty job!". You are completely right. But I'm a OSX user, as you can see by my notes, so Xenu is out of question. And I want something easier, maybe integrated in my website.

So I wrote Status.js!

Updates

2012-08-17: Graph plotting and sitemap.xml generator

A table is useful, but sometimes something more graphic is better. That's why I implemented the Javascript InfoVis Toolkit (Jit) in Status.js to plot the website as a graph you can interact with.

Since I was coding, I wrote a dead easy sitemap.xml generator that makes use of the crawler work. Nothing fancy, but if you can't generate a sitemap in a more correct way it can be useful.

Status.js, a jQuery crawler

Let's start with an screenshot of what I'm talking about!

Preview of Status.js

Demo

You can see status.js at work on this site here.

Sources on Github. Don't forget to star it!

How does it work?

Status.js will scan the website it's hosted on from the root (/) for links.
Internal links will be followed (fetched through Ajax) and scanned again. Yes, it's recursive.
External links will be memorized and used for cross-referencies. You can't check if an external link is broken with Status.js because of the cross-domain limitation of Ajax calls.

A table with some nice data will be populated in real time while Status.js is working.

How is it done?

Status.js is a Backbone application.
For Ajax and DOM manipulation, there is jQuery.
The url manipulation is powered by jsUri.
The GUI part is Twitter Bootstrap.

For the details, I'll write a note in the future :D

What can I check with Status.js?

Url

The url of the page. To avoid duplication, hashes will be removed.

Title

Available for internal pages only, the title tag is fetched. If not present, you'll get a {No title} placeholder.

Description

Available for internal pages only, the meta name="description" tag is fetched. If not present, you'll get a {No description} placeholder.

Status

Because of the Javascript-in-a-browser limitations, we can handle only these statuses:

Success
Available for internal pages only, means a correctly fetched page.
External
Indicates an external link.
Redirect
Indicates that there is another page for the same url but with a trailing slash. It's an hack around the browser that does not return any 30x http code
Error
_Broken link!_
Unfetched
Page memorized but waiting to be crawled.

Out links

It's the number of internal and external links present in the page. Clicking on the number you'll get the full list.

In links

It's the number of pages that link to the url. Clicking on the number you'll get the full list.

TODO list

This is a list of some of the things I'll have to work on. Please feel free to contribute with suggestions!

  • Warnings about duplicate/too long/missing titles/descriptions.
  • Verify for the presence of Google Analytics.
  • Check for broken images
  • Warning about missing/bad alt tags for images.
  • Pagination
  • Performance tests
  • Let's be honest... do tests!
  • Code cleaning, comments, etc.

Source code

The project is available on Github. It's in a very early stage of development, so no documentation is provided.

If you want to try it, clone the repository, put the status folder somewhere on the website you want to check, open the status.html page with your browser and cross your fingers.

Broken link

I'll put here a broken link to test the tool. Don't click me!

Have fun!