The need to find broken links easily is an old one and after googling and searching for the best product available – free or non-free, in the end I found a very old free software that well enough what I needed. The latest version is from September 4th, 2010 and original version was developed in 1997.
The software was written in Germany and perhaps the license details show this best. They are altogether three words “it is free”.
You run Xenu in your Windows computer. I used Windows 10 for my testing.
When I set my internal URL as fujitsu.com/fi instead of fujitsu.com/fi/, I had collected over 150,000 links from http://www.fujitsu.com before I noticed that the filtering doesn’t work correctly and it is still running on background. But at least the software works even for very large sites also.
2 Installation and settings
- Download from http://home.snafu.de/tilman/xenulink.html#Download
- Extract the zip file
- Start the setup.exe
- Start the application
- Provide the search URL such as http://www.fujitsu.com/fi
- Provide internal URLS such as fujitsu.com/fi/ (note that the char ‘/’ is critical in there)
- Select View –> Show broken links only
- Follow the URL count on lower right corner.
- Export results with File –> Export Page Map with Tab Separated File
- Open Excel and import CSV file into it: With version 2013 this is done from Data –> From Text. It is best to create this as a permanent connection that is refreshed manually. If the file is very large, then it is also best to make this as a PivotTable also. If you get an error that the file has too many lines, read it anyway. Then later on filter out “ok” message.
- You may wish to create VLOOKUP function on the Excel to show results nicer.
- Sort, interpret and fix the results
|auth required||Ignore||I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.|
|Certificate authority unfamiliar||Preferably Inform||Web page is functioning, but certificate is not trusted. In the case I found, the certificate had been made for the IP number and it failed for the server name.|
|Error 999||Ignore||LinkedIn gives for unknown reason error 999. Normal browsing works fine.|
|Forbidden request||Ignore||I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.|
|LinkToPageStatus||Ignore||Excel issue: this is the heading.|
|mail host ok||Ignore||This is “mailto:…” link and it appears to be fine|
|No connection||Highly probable fix||This is no connection with the server. Web site is down, but you need to check these one-by-one if this is permanent or temporary.|
|no info to return||Search from target||HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.|
|no object data||Ignore||These would seem to work fine|
|No such host||Certain fix||Server has been renamed or terminated. As name service can’t find this, this is permanent.|
|not found||Search from target||HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.|
|Request URI too long||Inform||A defect in the web site|
|Server error||Probable fix||Web site is down. You may need to ask from the target owner if that is permanent or not.|
|SSL certificate common name incorrect||Ignore||Reason for this message is unknown, but it appears that you can ignore these|
|Temporarily overloaded||Ignore||These pages were working fine.|
|the resource is no longer available||Search from target||This page has been moved or removed and the site redirect you to its front page instead.|
|timeout||Probable fix||You may need to ask from the target owner if it is working still.|
|Other||Test||Right click the item and test it|