Fixing missing links from web sites

1           Xenu

The need to find broken links easily is an old one and after googling and searching for the best product available – free or non-free, in the end I found a very old free software that well enough what I needed. The latest version is from September 4th, 2010 and original version was developed in 1997.

The software was written in Germany and perhaps the license details show this best. They are altogether three words “it is free”.

1.1           Limitations

You run Xenu in your Windows computer. I used Windows 10 for my testing.

When I set my internal URL as fujitsu.com/fi instead of fujitsu.com/fi/, I had collected over 150,000 links from http://www.fujitsu.com before I noticed that the filtering doesn’t work correctly and it is still running on background. But at least the software works even for very large sites also.

2           Installation and settings

  1. Download from http://home.snafu.de/tilman/xenulink.html#Download
  2. Extract the zip file
  3. Start the setup.exe
  4. Start the application
  5. Provide the search URL such as http://www.fujitsu.com/fi
  6. Provide internal URLS such as fujitsu.com/fi/ (note that the char ‘/’ is critical in there)
  7. Select View –> Show broken links only
  8. Follow the URL count on lower right corner.
  9. Export results with File –> Export Page Map with Tab Separated File
  10. Open Excel and import CSV file into it: With version 2013 this is done from Data –> From Text. It is best to create this as a permanent connection that is refreshed manually. If the file is very large, then it is also best to make this as a PivotTable also. If you get an error that the file has too many lines, read it anyway. Then later on filter out “ok” message.
  11. You may wish to create VLOOKUP function on the Excel to show results nicer.
  12. Sort, interpret and fix the results
Message Recommended action Explanation
auth required Ignore I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.
Certificate authority unfamiliar Preferably Inform Web page is functioning, but certificate is not trusted. In the case I found, the certificate had been made for the IP number and it failed for the server name.
Error 999 Ignore LinkedIn gives for unknown reason error 999. Normal browsing works fine.
Forbidden request Ignore I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.
LinkToPageStatus Ignore Excel issue: this is the heading.
mail host ok Ignore This is “mailto:…” link and it appears to be fine
No connection Highly probable fix This is no connection with the server. Web site is down, but you need to check these one-by-one if this is permanent or temporary.
no info to return Search from target HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.
no object data Ignore These would seem to work fine
No such host Certain fix Server has been renamed or terminated. As name service can’t find this, this is permanent.
not found Search from target HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.
ok Ignore No issues
Request URI too long Inform A defect in the web site
Server error Probable fix Web site is down. You may need to ask from the target owner if that is permanent or not.
skip type Ignore This tool doesn’t test Javascript
SSL certificate common name incorrect Ignore Reason for this message is unknown, but it appears that you can ignore these
Temporarily overloaded Ignore These pages were working fine.
the resource is no longer available Search from target This page has been moved or removed and the site redirect you to its front page instead.
timeout Probable fix You may need to ask from the target owner if it is working still.
Other Test Right click the item and test it

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s