Chapter 6: Beginning JavaScript
Controlling Data Entry Using Form Fields
Creating 'Encoded' Name & Value Pairs
You are here: irt.org | Articles | JavaScript | Form | Addressing Form Field Validation with Regular Expressions and JavaScript 1.2 [ previous next ]
Published on: Sunday 16th November 1997 By: Jason Nugent
As most of us are aware, using a form on a website is an effective way to gather information from a visitor. Information can be requested, mailing lists can be subscribed to, and comments and feedback can be submitted. For those of you who have not yet implemented a form on your website and are wondering about the HTML syntax in doing so, I will first delve into such details. If you are already familiar with forms, you may skip this section and move onto the next.
A form on a website is embodied inside the <FORM> tag, so we will look at this one first and in some detail. Like most HTML tags, the <FORM> tag takes a number of attributes related to the form itself. It possesses the following syntax:
<FORM attribute1=".." attribute2=".."> ... </FORM>
where the attributes can be one (or more)of the following:
Inside the <form> tag, you place all the elements that you want to use in the form itself. The first element is an <input> tag which takes attributes the define how it appears on the page. It is an open-ended tag, which means that it does not have a corresponding </input> tag to close it. These <input> tags can be of type text, checkbox, image, password, radio, submit, and reset. There also exists a <textarea> tag which creates a large area for entering multi-lined information. The <select> tag is used to create a drop-down list of items.
Defining each of these items is not the purpose of this article, but if you would like to know more, please visit the World Wide Web Consortium at http://www.w3.org for more information. For now, suffice it to say that qualifying each component of your form with a name=".." attribute is required if you wish to work with either a CGI script or a JavaScript function.
The simple form used in this example contains two text fields, one for a name, and the other for an email address. One button has been added, by which the form is submitted to the server.
<form name="form_name" onSubmit="return isReady(this)" action=""> <table cellpadding=0 cellspacing=5 border=0><tr> <td align="left">Your Name:</td><td align="left"><input type="text" name="Name"></td> </tr><tr> <td align="left">Your Email Address:</td><td align="left"><input type="text" name="address"></td> </tr></table> </form>
Note that the form has been given a name, which is passed to the isReady() function upon submission through the this keyword. In JavaScript, this implies the current object.
As we have seen, coupling interactivity via forms and programs or scripts on a server through the Common Gateway Interface (CGI) is an effective way to obtain information from individuals visiting your website. However, there are risks associated with running a CGI script from the web. Poorly written scripts that accept malformed information from an unknowing or malicious user could be made to do things that could bring your server to its knees.
For example, imagine operating a website that contains a field that allows a user to enter the name of a directory on the server. Certainly not the smartest idea, but they are out there. If someone were to put the following in as the directory they wanted listed, bad things could happen:
web_directory ; /bin/rm *
Quite possibly, the command to list the directory would be carried out normally, and then the second command (/bin/rm *) could be carried out and erase a directory.
There are several ways to prevent this sort of thing from happening, and some are better than others, depending on the situation. First and foremost, the script itself could be written to verify that the form submitted to it does not contain any malicious code. Upon detecting such an attempt, the script could refuse to process the entry and store the submitter's IP address in a file for future reference. Or, more simply, the script could simply display an alternate page telling the visitor that their input was not accepted.
While this is a very good method to use when validating form field input, it does have its disadvantages. One of the biggest is the overhead involved with parsing input on the server. A busy server that parses all of its requests could be slowed considerably, resulting in a website that appears sluggish. Here is where JavaScript comes to the rescue!
By passing the contents of the form to a JavaScript function before submission, the contents can be validated before being sent to the server, which reduces server overhead.
Beware, however, that a poorly written script can still accept requests that do not come from the form. It is possible that a malicious user from a completely different domain could run your script directly and feed it bad information. Fortunately, there are several ways around this. One of the easiest is to make your CGI script examine the HTTP_REFERER and REMOTE_HOST environmental variables that are submitted with every request. These variables contain the URL of the requesting document and the domain name of the foreign server respectively, and could be checked to ensure that the request was submitted from a user on an allowed domain (in particular, your own). If the request is not allowed, the foreign domain name could be logged in a file and refused access to the script.
It is also important to ensure that some form of error checking still takes place, even if the request is a legitimate one. A visitor using a browser that does not support JavaScript could still conceivably submit malformed code.
JavaScript 1.0 offered a way to check and see if a field contained certain characters using the indexOf() method. If a character was found, the position of the character was returned as a number. For example:
var a = "This is my field's contents"; var b = a.indexOf("my"); // b now contains 9.
As you can see, b now contains the position (starting from 0) that the pattern "my" was located at. If the pattern searched for was not found, the indexOf() method returns -1.
But what if you wanted to check for several characters all at once? What if you wanted to make sure that an email address only contained numbers, letters, an "at sign", and a period? By using indexOf(), you would be required to write several lines of code, each using indexOf() to look for ALL the characters you didn't want to find. If an illegal character is found, an alert box could be flashed asking the user to re-enter their information. The following functions use JavaScript 1.0 functionality to examine either a text field containing regular text or a text field containing an email address. By passing the contents of the form to the isReady() function using the onSubmit event handler, the information is validated before being sent to the server. If the function returns true (i.e. everything checks out), the ACTION attribute of the form is run.
Note that these functions can be used independently of a form. These methods can be used anywhere, as long as an appropriate string value is passed as an argument.
<script language="JavaScript"><!-- function isEmail(string) { if (!string) return false; var iChars = "*|,\":<>[]{}`\';()&$#%"; for (var i = 0; i < string.length; i++) { if (iChars.indexOf(string.charAt(i)) != -1) return false; } return true; } function isProper(string) { if (!string) return false; var iChars = "*|,\":<>[]{}`\';()@&$#%"; for (var i = 0; i < string.length; i++) { if (iChars.indexOf(string.charAt(i)) != -1) return false; } return true; } function isReady(form) { if (isEmail(form.address.value) == false) { alert("Please enter a valid email address."); form.address.focus(); return false; } if (isProper(form.username.value) == false) { alert("Please enter a valid username."); form.username.focus(); return false; } return true; } //--></script>
Although this method works fine if you want to ensure that certain characters are not present in the field, it falls short when trying to ensure that certain patterns ARE present. What if you only wanted to allow email addresses from a certain domain, while not allowing others? What if only word-word@word-word.word email addresses were allowed? These things would be incredibly difficult, if not impossible, to do with indexOf() and JavaScript 1.0.
JavaScript 1.2 shows the way through the power of regular expressions. These expressions, which offer the same functionality as regular expressions taken from Perl, a very popular scripting language, add the ability to parse form field input in ways that were simply not possible before. The examples below, which only work in Netscape Navigator 4.0x and Internet Explorer 4, illuminate the power associated with these new additions.
First off, what is a regular expression? Put simply, a regular expression is a string of special values that programmers can use to explicitly match a specific string of text.
Before we get into using regular expressions to parse text, it is important that you understand a bit about how regular expressions work and what special characters do what. There is just too much to get into here, but here are a few that come up often:
. matches any singular character. ? matches one or none of the preceding character. + matches at least one of the preceding character. * matches none or all of the preceding character. ^ matches the absolute beginning of the string. $ matches the absolute end of the string. \w+ matches a whole word. \w matches a "word" character (alphanumerics and the "_" character). \W+ matches whitespace. x|y matches one or the other of x or y. [0..9] matches ONE number, ranging from 0 to 9. [A-Za-z] matches any letter, uppercase or lowercase.
Parentheses can be used to group characters together.
(this)+ matches at least one occurrence of "this".
If you wish to search for one of the special characters, you must first delimit it with a backslash(\).
\. matches a period. \? matches a question mark. \[ matches a left square bracket. \| matches a "pipe" character.
In addition to these, modifiers can be added after the regular expression to control how it searches through the string. Some of more useful ones include these:
/somematch/g - global (matches all instances). /somematch/i - ignore case. /somematch/gi - you can combine them, too.
JavaScript 1.2 contains a number of new constructors and methods that allow a programmer to parse a string of text using regular expressions. The first thing you must do before you can begin parsing a string is to determine exactly what your regular expression will be. There are two ways to do this. The first is to specify it by hand using normal syntax, and the second is to use the new RegExp() constructor. The following two statements are equivalent:
pattern = /:+/; // matches one or more colons pattern = new RegExp(":+"); // same thing.
There is one very important thing to notice here. With the first method, it is important to remember to delimit your expression using slashes. A slash specifies the beginning or the end of a regular expression. You may also place the regular expression directly into the function without first defining it using the RegExp() method, which is what I do in the examples below.
The replace() method allows a programmer to replace a found match with another string. It takes two arguments, one being the regular expression you want searched for, and the other being the replacement text you want substituted. For example:
var t = "javascript is great"; var s = t.replace(/javascript/, "JavaScript"); // fixes the capitalization.
The variable s now contains "JavaScript is great". The next method is the search() method. This method searches the source string and returns the location of the first match if the pattern is found, otherwise -1. It effectively duplicates the functionality of JavaScript 1.0's indexOf() method. Example:
var s = "Let's use Regular Expressions"; var found = s.search(/use/); // found now contains 6.
If the search string is not located, the function returns -1. This method is the one that will enable us to parse a field's contents to make sure that people aren't submitting information that could damage our server. Before we do that, however, let's take a look at the next method provided for regular expressions, the split() method. The split() method is actually present in older versions of JavaScript but has been updated for JavaScript 1.2 to accommodate regular expressions. It searches through a string and "breaks apart" the string and stores each part in an array. The example below uses a pattern that looks for a colon and stores each part in the array a.
var s = "Jason:Nugent:this:is:great:don't:you:think"; var a = s.split(/:/);
In this case, a becomes the array containing ["Jason", "Nugent", "this", "is", "great", "don't", "you", "think"]. In common CGI applications, this same technique is used to separate a comma delimited text file that perhaps serves as a database containing user information.
The match() method searches a string in a different way. It returns an array consisting of all the matches found in the string that match the regular expression. If no matches are found, it returns null.
var s = "Thank you, there, for thinking about me."; var a = s.match(/th\w+/gi); // matches a word beginning with th, globally, and ignore case.
a is an array that now contains ["Thank", "there", "thinking"].
Now, finally, we get to do some useful things with regular expressions. The following function will parse a form consisting of a username and an email address, and alert the user if the username is not entirely made up of characters, numbers or spaces. The function will also alert the user if the email address contains more than just alphanumerics, an "at" sign, periods, or hyphens.
Since regular expressions are only a part of JavaScript 1.2, we must determine the browser being used and plan accordingly. Since all other browsers ignore JavaScript 1.2, we can simply use the language="JavaScript1.2" qualifier to refine our parsing functions. Older browsers will simply skip over this code.
<SCRIPT language="JavaScript1.2"> function isEmail(string) { if (string.search(/^\w+((-\w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+$/) != -1) return true; else return false; } function isProper(string) { if (string.search(/^\w+( \w+)?$/) != -1) return true; else return false; } //--></SCRIPT>
Ok. Let's stop and examine the regular expressions used in the functions above. First, let's look at the isProper() function since it is simpler. The Regular Expression used is /^\w+( \w+)?$/.
Ok. Shall we move on to the isMail() function? The Regular Expression is /^\w+((-\w+)|(\.\w+))*\@[A-Za-z0-9]+((\.|-)[A-Za-z0-9]+)*\.[A-Za-z0-9]+$/.
This pattern allows for email addresses like the following. With this particular regular expression, the bare minimum that a person could enter as an email address is x@x.x, where x is any alphanumeric character:
someone@somewhere.com someone.somebody@somewhere.com someone.sombody@somewhere.where.com some-one@somewhere.com some-one.somewhere@wherever.com some-one.somewhere@where-ever.com
Why not try the example out, which works in Netscape Navigator 2, 3 and 4, as well as Internet Explorer 3 and 4.
You can view the source code of the working example.
If you are interested in learning more about JavaScript 1.2, feel free to examine these sources of information on the web:
What's new in JavaScript 1.2: http://developer.netscape.com/library/documentation/communicator/jsguide/js1_2.htm
JavaScript 1.2 Reference: http://developer.netscape.com/library/documentation/communicator/jsref/index.htm
For a good introduction to regular expressions, please check out: ftp://ftp.ou.edu/mirrors/CPAN/doc/manual/html/pod/perlre.html
In addition, you might want to check out Tom Christiansen's page on Regular Expressions in Perl 5, which can be found at: http://www.perl.com/CPAN-local/doc/FMTEYEWTK/regexps.html The FMTEYEWTK stands for "Far More Than Everything You Ever Wanted To Know".
Chapter 6: Beginning JavaScript
Controlling Data Entry Using Form Fields
Creating 'Encoded' Name & Value Pairs
Passing data from one form to another