Inert Gas Design - No Broken Links

Links to, and within, a website should never break. If a link becomes invalid then something was not planned out properly during the initial design of the site. I'll give you a second to mull that statement over, before I get into the most common reasons for a link to break, and how to avoid them.

First the most common mistake, and also one of the hardest to prevent, the site redesign. As I'll get into in later articles, if the content is separated from the presentation, site redesigns can be made quickly and painlessly. But that doesn't touch the file and directory layout, that actually makes up the links of the site. The best way to avoid breaking links in this case is to just do a little planning before you start creating any files. Design your site, before you design your site.

Remember those outlines from English class? They don't just help organize thoughts when writing a paper, they can also be used to organize the directory structure for a website. Think about it, the outline is a hierarchy, so is a filesystem. There is a perfect mapping between the two.

Sit down for a minute and think about what the website is going to be presenting, and see how it could be sorted into sections, sub-sections, sub-sub-sections, and so on. Take this site as an example, we have four main sections, Articles, Experiments, Reference, and Links. My outline would look like this to start:

Articles
Experiments
Reference
Links

I don't need to know exactly what will go under each of these categories to fill it out, but I have pretty good idea which ones will have extra information below them, and which ones will stand alone. So I expand my outline a level deeper.

Articles
- No Broken Links
- ...
- ...
Experiments
- em for Image Widths
- XHTML 2.0
- ...
Reference
- XHTML
  - 1.1
  - 2.0
- CSS
  - 2
  - 3
- ...
Links

That will about do it for this site. Everything I have planned will fit under those four headings, and I sub-divided the references for XHTML and CSS one level deeper to cover different revisions of the specifications. This is the important part though. If this site were a newspaper with multiple articles written every day, I should have divided that up much deeper. For that case, I would have gone this route:

Articles
- Year
  - Month
    - Day
      - Story 1
      - Story 2
      - Story 3
      - ...

It isn't hard to add more entries at one level, but if the deepest level doesn't turn out to be fine enough, you my find trouble trying to divide it, if you made the heading too specific. Although, if I keep going with this article, I might find the need to cut it into chapters, which would be an obvious and clean subdivision.

So now you have your outline. How does that translate in the website layout? Simple, you just create a directory for each entry in the outline, sub-directories for the sub-entries, and so on. Going back to my example, I made these directories:

articles
articles/no_broken_links
experiments
reference
reference/xhtml
reference/xhtml/1.1
reference/xhtml/2.0
reference/css
reference/css/2
reference/css/3
links

I can hear you asking why I created a directory for even the lowest level, when a .html file would work. You are right, I could have made a file called no_broken_links.html in my articles directory. But that is where the second cause of broken links comes from. When you do start to expand a site, and you find one section has grown too big, and needs to be divided, it makes it possible to create sub-sections, if you've always been referring to it as a directory already.

Lets take my links section. It is small, there are only a few sites I link to now, they easily fit on one page. But what happens if I go link crazy, and not only link to useful sites, but also sites that are just good examples of site design, and perhaps I want to give some counter examples with sites that should not be emulated.

If I had been linking to /links.html all over my site, and all the search engines had indexed my site that way, and some people had bookmarked that particular page, it would be a bad idea to change that. It just looks bad to say, "this page no longer exists, you'll be directed to the new location in 5 seconds", or what ever. Sure you could replace the contents of the links.html with an index page that will show the new sub-sections. But then it doesn't match the rest of your site that had sub-sections already. Really the best thing to do from the start, is to just make a directory called, links, and place in index.html file inside it. Now, you never reference that index.html file directly, just make links as <a href="links">. So if the need comes to split it up into more categories of links, you just create the subdirectories for each, and place in them, their own index.html files. Then you can update your links/index.html to point to the newly created sub-sections. No broken links, just an expansion of content.

The directory with the index.html file also helps out with the third and final most common breakage. This occurs when changing from static content to dynamicly generated content.

First it is no business of your visitors how you create you pages, and while security through obscurity is nothing to rely upon, it does add a little extra layer of protection. What I'm saying, is what does it matter if your page is a .pl, or .cgi, or .php, or ,asp. By showing that, you are giving away a little more about the man behind the curtain. If instead you rely upon the webserver to serve the default file from a directory, you can hide the actual file name.

This problem exposes itself again as your site starts to grow. You start small, editing each complete HTML document by hand. No work being done by the server, the content is totally static. If you weren't following my previous tip, you'd have links like broken_link.html all over your site. As time goes by, you become lazy (laziness is a good attribute, that means you are finding a way to make the computer do more work for you), and get tired of going to each page, to make the same changes. You realize that a lot of the code on every page of your site is the same. So you go about setting up server side includes. The server administrator is good enough to make index.shtml have higher preference than index.html as the default file to load for a directory, but that doesn't do you much good, because your pages are named broken_link.html. Now, you have to rename them to broken_link.shtml to make the server parse them for include information. So every page you make into an SSI page, will break someone's bookmarks, or will no longer be found from a search engine. Again, you could leave a redirect page with the old name, but as I said, that just looks unprofessional.

If you had been just linking to the directory instead of directly to a file, the server would see your new index.shtml file and just serve that one instead of the old index.html, and no one would miss a beat.

One more variation of this is the cgi-bin directory. This is guaranteed to break your site during a static to dynamic conversion if you have to place all your executable scripts in a special directory. This is no longer required by any modern webserver. It may have been a little more secure than allowing executables to reside in any directory, but that problem is also behind us if the server is patched up the current revision. So if who ever is hosting your site tells you you need a special CGI directory, see if that can be worked around with some other options, or look else where for hosting, they are living in the dark ages.

In closing, if you are expanding a site, and find in the logs 404s from people trying to use bookmarks that were valid the day before, or people coming in from search engines, for months to come, looking for content on your site that is no longer there, you have no one to blame but yourself (or maybe the person from whom you inherited the mess). A little extra planning in the initial design stage can pay off so well into the distant future for your site.