Published on: Sunday 10th October 1999 By: Michael Bednarek
The main purpose of HTML is to enable web authors to specify structural information about their pages - for example tables, paragraphs, images and so on. However, it also provides a way of adding information about the page and its content. Such information is known as metadata, and is added through the use of the <META> tag.
The tag can also be used to create the equivalent of HTTP (HyperText Transfer Protocol) headers, which can provide instructions to the browser.
This article will describe the most common uses of the META tag. It is not intended to be a definitive list; new metadata is being created and used all the time, for various purposes. If you are interested in metadata, you may be interested to read Janus Boye's article RDF - What's in it for us?.
All META tags should be placed within the HEAD section of your web pages. When using META to provide information about your page, the general syntax is as follows:
<META NAME="dataname" CONTENT="datavalue">
In the above line, dataname is a specific identifier for the information you are providing. The browser (or other program) looks at this name and then decides how to treat the data. The information itself is represented by datavalue. In this section we'll be looking at the most common types of metadata and how they are typically used.
This is simply a basic description of the page's content, in a few sentences. Search engines often use it to display a brief page summary in the search results. An example might be:
<META NAME="description" CONTENT="An article about metadata and the META tag.">
Another self-explanatory one; this is usually used to display the name of the person who wrote the page's content (or, in some cases, the designer of the page, if this is different).
<META NAME="author" CONTENT="Michael Bednarek">
This is the metadata that everyone is talking about, although in reality its effectiveness is overhyped. The Keywords item allows you to specify a number of themed words and phrases which may be associated with that web page in some particular way. For example, some keywords associated with this article might be: metadata, META tag, search engines, HTTP, HTML, etc. etc.
When you search for something in a search engine, you generally type in a few words (or perhaps a phrase) related to what you are looking for. The engine then matches up the keywords you have entered with the pages stored in its database. This is why people have been going nuts over the META keywords tag; they think that if you don't have it on each page, and don't have an extensive list of words, then you won't get very good search engine results.
In fact, only three of the major search engines (AltaVista,InfoSeek, HotBot) give any importance to META tags - the others base their results upon a page's actual content. However, it is still worth adding META keywords to your pages if you want good results in those engines. Words and phrases are treated differently - so for example, you would need to include the phrase "web authoring" as well as the two individual words "web" and "authoring" for best results. Here's an example tag:
<META NAME="keywords" CONTENT="metadata, META tag, meta, keywords, search engines">
Finally, there is the Robots item. This is also related to search engines, in a way. A robot is a program which will visit a web page, index it somewhere, and then visit all the hyperlinks in that page, indexing them all. It may continue in this fashion indefinitely, or it may stop after it has reached a certain level. Search engines often send a robot round to your site, in order to add all its pages to their database.
Sometimes you may not want certain pages on your site to appear in a search engine. These might be pages containing sensitive information, or those which should not be viewed outside of a frameset. You can use the META tag to provide instructions to robots visiting a page - you can tell them not to index the page, or not to follow any of the links on it, or both.
Here are examples of some of the combinations you can use:
<META NAME="robots" CONTENT="NOINDEX,NOFOLLOW"> <META NAME="robots" CONTENT="NOINDEX,FOLLOW"> <META NAME="robots" CONTENT="INDEX,NOFOLLOW"> <META NAME="robots" CONTENT="INDEX,FOLLOW">
The last line in this list is in fact the default setting, so you wouldn't ever need to use it in practice.
As I mentioned in the introduction to this article, the META tag can also be used to generate the equivalent of HTTP headers. In practice, this means you can control the behaviour of the user's browser.
There may be occasions when you would want to prevent a page from being cached locally on the user's computer, and thus force the browser to load a fresh copy each time. One example of this might be a webcam which is automatically updated every few seconds - if the user visited at a later date, the browser might show them the cached (and therefore outdated) version.
There are in fact three META tag variates which you should use to cause this behaviour. This is because they are accepted in different browsers. The first, Expires, is actually supposed to specify an expiry date for the web page. However, if you leave the value as 0, then it treats it as "now", and therefore asks for a new version of the page every time. The other two, Pragma and Cache Control, are specifically designed to prevent (or control) caching, and should take a value of "no-cache". So, to prevent your page being cached in most browsers, you should use the following lines:
<META HTTP-EQUIV="Expires" CONTENT="0"> <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache">
Incidentally, if you wish for your page to expire at a later date, you can specify the date in the following format (using GMT time):
<META HTTP-EQUIV="Expires" CONTENT="Thu, 03 Aug 1999 09:30:00 GMT">
With web hosting services becoming cheaper and cheaper, many people decide to change the location of their web sites from free ISP space to professional server space. This relocation obviously causes confusion, because visitors to the site will still try to use the old URL. What many people do is keep one page at the old address which, when visited, will automatically send the user on to the site's new home. This can be easily achieved using the META tag, which will redirect the user either instantly or after a given time delay.
For example, the line below will redirect the visitor to irt.org after ten seconds, giving them enough time to read about what's going on:
<META HTTP-EQUIV="Refresh" CONTENT="10;URL=http://www.irt.org">
META tags can be very useful additions to your web pages, especially if you would like more control over how the search engines treat your site. Although the keywords and description tags are the most well known, the other forms of metadata can also prove to be helpful.