Fixing missing links from web sites

1           Xenu

The need to find broken links easily is an old one and after googling and searching for the best product available – free or non-free, in the end I found a very old free software that well enough what I needed. The latest version is from September 4th, 2010 and original version was developed in 1997.

The software was written in Germany and perhaps the license details show this best. They are altogether three words “it is free”.

1.1           Limitations

You run Xenu in your Windows computer. I used Windows 10 for my testing.

When I set my internal URL as fujitsu.com/fi instead of fujitsu.com/fi/, I had collected over 150,000 links from http://www.fujitsu.com before I noticed that the filtering doesn’t work correctly and it is still running on background. But at least the software works even for very large sites also.

2           Installation and settings

  1. Download from http://home.snafu.de/tilman/xenulink.html#Download
  2. Extract the zip file
  3. Start the setup.exe
  4. Start the application
  5. Provide the search URL such as http://www.fujitsu.com/fi
  6. Provide internal URLS such as fujitsu.com/fi/ (note that the char ‘/’ is critical in there)
  7. Select View –> Show broken links only
  8. Follow the URL count on lower right corner.
  9. Export results with File –> Export Page Map with Tab Separated File
  10. Open Excel and import CSV file into it: With version 2013 this is done from Data –> From Text. It is best to create this as a permanent connection that is refreshed manually. If the file is very large, then it is also best to make this as a PivotTable also. If you get an error that the file has too many lines, read it anyway. Then later on filter out “ok” message.
  11. You may wish to create VLOOKUP function on the Excel to show results nicer.
  12. Sort, interpret and fix the results
Message Recommended action Explanation
auth required Ignore I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.
Certificate authority unfamiliar Preferably Inform Web page is functioning, but certificate is not trusted. In the case I found, the certificate had been made for the IP number and it failed for the server name.
Error 999 Ignore LinkedIn gives for unknown reason error 999. Normal browsing works fine.
Forbidden request Ignore I made my testing without trying to authorize myself, which makes the test much faster, so in my case these web sites were not tested.
LinkToPageStatus Ignore Excel issue: this is the heading.
mail host ok Ignore This is “mailto:…” link and it appears to be fine
No connection Highly probable fix This is no connection with the server. Web site is down, but you need to check these one-by-one if this is permanent or temporary.
no info to return Search from target HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.
no object data Ignore These would seem to work fine
No such host Certain fix Server has been renamed or terminated. As name service can’t find this, this is permanent.
not found Search from target HTML error 404. Typically the target has been moved to elsewhere and thus fastest way to fix this may be just to search for the new location.
ok Ignore No issues
Request URI too long Inform A defect in the web site
Server error Probable fix Web site is down. You may need to ask from the target owner if that is permanent or not.
skip type Ignore This tool doesn’t test Javascript
SSL certificate common name incorrect Ignore Reason for this message is unknown, but it appears that you can ignore these
Temporarily overloaded Ignore These pages were working fine.
the resource is no longer available Search from target This page has been moved or removed and the site redirect you to its front page instead.
timeout Probable fix You may need to ask from the target owner if it is working still.
Other Test Right click the item and test it

 

Advertisements

New web site for creating safe AGI and technology singularity

There are increasing number of people who are concerned about the survival of human race when computers/AI/AGI will become smarter than us. And a number of them wish to contribute financially as well.
There are a few reputable organizations that you can contribute to, but I think there is a room for a small “meta organization” still, which would:
– Be a neutral/non-biased actor in the field. Some of these organization must be functioning better than the others and it would be good for a donator to get a comparison of them. Which one of them he should donate to.
– Best is if one of these organizations can be used to stay up to date. But again a “meta mail” system would be nice, where you can choose if you wish to get all of the mail from these organization combined or just four times in a year or something between these two extremes.
– And same goes for collecting documentation, understanding etc. And especially with “recommendation” or “rating” of how good these articles are.

– Utilize crowd-sourcing for the maintenance. This has been done in wikipedia and in lesswrongwiki, but as far as this topic is concerned, both of them have failed badly.

– And best of all would be if we could create a web site that works on this topic together to define where we are going, what can be done etc in a structured manner, so we would build up documents from small sub topics, so for example brain-to-computer would have one document on this level and several documents in more detailed level.

Here you can see one picture that describes some of the probable causal effects that *any* AI/AGI request would create if when it would get extreme and unfortunately computers are very single-minded if that is not prevented and thus they by default would attempt to do everything in overly extreme fashion.

Consequences of AI goals