UNIX,BOLUM 2     |   home

UNIX: Data Tools

Data and Text Manipulation in the UNIX Environment
Regular expressions
Using grep to Search Text Files
Line Editing Files with sed and tr
Parsing Files With awk
Comparing Files with diff
Cutting and Pasting
Using join to Merge Files
Controlling Flow with Loops
A Final Note
 Data and Text Manipulation in the UNIX Environment
This document is intended for relatively advanced UNIX users interested in manipulating text and data sets. The document assumes that users are working in the ksh, though the topics covered apply to other UNIX environments as well.
 Regular Expressions
Each of the tools examined in this document utilize regular expressions to perform a task. Without them, data and text manipulation would be more difficult. The construction of regular expressions and their use can be a daunting task. This section should make them better understood and more manageable.
What is a regular expression?
A regular expression is a text pattern that combines normal text characters and special characters, called metacharacters, to create a single entity that can have various interpretations. What does this mean? Well, it means that we can create a single text pattern that can have many fixed strings that match it. Okay, now what does this mean? It means that we can use a regular expression to identify all of the occurrences of the text pattern, based on some common feature (though the occurrences can be quite different). Let's look at an example.
Examples of regular expressions
Let's assume that we have a text file and we want to search through it for all occurrences of the words "pack," "puck," and "pick." We can do this with the grep utility (to be fully explained later) by searching for each of the fixed strings as follows:
grep `pack' filename
grep `puck' filename
grep `pick' filename
By searching for each of these fixed strings, we will find all occurrences of the three words in the file specified as filename. This task, however, can be performed more efficiently with a regular expression, as demonstrated below:
grep `p[aui]ck' filename
The open and close brackets are special characters (again, metacharacters) interpreted by UNIX programs to mean that any one of the characters listed in the brackets will fulfill the search requirements. Thus, here complete our task in one command using a regular expression, when before it took us three commands to do the same thing using strings.
Filename expansion versus pattern matching
Metacharacters are interpreted differently by the UNIX shell (in this case, the ksh) than they are by pattern matching utilities like grep, awk, and sed. As a result the order in which metacharacters are interpreted is an important consideration. When commands are issued at the UNIX prompt, metacharacters are first seen by the shell and then by the program (grep, etc.). Here's an example:
grep [a-z]* somefile
The shell will read the above regular expression for filename expansion. It will look for any file in the current directory that starts with a lowercase character from a to z, followed by any string of characters (the * symbol is interpreted by the shell to mean any string of characters (including zero length strings)). So, assuming these files exist, this command could be interpreted as:
grep alpha.txt beta.txt lambda.txt omega.c somefile
The grep utility would then try to find the pattern alpha.txt in the files beta.txt, lambda.txt, omega.c and somefile. This is not what was intended. To keep the shell from erroneously interpreting metacharacters for filename expansion, enclose your regular expressions in quotes (double quotes suffice in most cases, but single quotes are best). The command should look like this:
grep `[a-z]*' somefile
Some common metacharacters and the programs that use them
Some metacharacters are valid in one program but unsupported in another. To make these metacharacters more readily accessible, a brief table of the more common metacharacters appears below, along with their descriptions and the programs that support them.
Operator  Usage       Meaning                           grep  sed  awk   
   .      .           Matches any single character       y     y    y    
                      other than a newline                               
   *      char*       Matches any number >= 0 of the     y     y    y    
                      preceding character                                
   ^      ^string     string  must occur at beginning    y     y    y    
                      of a line                                          
   $      string $    string must occur at end of a      y     y    y    
                      line                                               
   []     [abc]       Matches any single character       y     y    y    
                      from list                                          
  \(\)    char \(n    Matches from n  to m               n     y    n    
          ,m \)       occurrences of the preceding                       
                      char , n , m >= 0 and <= 256                       
   \<     \<string    string  must occur at beginning    n     n    n    
                      of a word                                          
   \>     string \>   string  must occur at end of       n     n    n    
                      word                                               
   +      string +    Matches one or more of             n     n    y    
                      preceding string                                   
   ?      string ?    Matches zero or one of             n     n    y    
                      preceding string                                   
   |      string1     Matches either of separated        n     n    y    
          |string2    character strings                                  

To learn more about using metacharacters for data management, read the various manual pages and consult the UNIX manuals that support these different tools.
Note: Remember, it is not necessary to include a metacharacter in a regular expression. If a fixed string serves the purpose, use it! Don't try to make things harder than they need to be.
 Using grep to Search Text Files
The grep utility in UNIX is a powerful search tool that can examine the contents of files (or standard input) for a regular expression and print the line(s) in which the expression occurs. The basic syntax for grep is:
grep [-flag(s)] `regexp' [filename1] [filename2...]
If no files are specified on the command line, grep will search standard input for the expression. Since many of the metacharacters used for pattern-matching (*, ?, etc.) are recognized as special characters by the shell, it is a good practice to enclose the regular expression in single quotes. The quotes force the shell to interpret the regular expression literally.
Let's look at a couple examples of the grep utility. Say you have a file, smilies.txt, and you want to search the file for occurrences of the word "basic." You would construct the command like this:
grep `basic' smilies.txt
From time to time, you may want to search for a regular expression in UNIX standard input rather than a file. For instance, you may want to search all of the active processes on a machine for the number of pine email sessions currently running. To do so, you can use the ps command to list the current processes and pipe the output to the grep utility like so:
ps -ef | grep `pine'
Note: Both of the above uses of grep will print the entire line containing the regular expression, not just the expression itself.
Flags for grep
There are a number of useful flags that can be used with grep to modify the output that it returns. Here are a few of the more commonly used flags:
  Flag     Effect                                                            
   -c      print only the count of matching lines                            
   -h      print matched lines but not filenames                             
   -i      ignore uppercase and lowercase distinctions                       
   -l      print only the names of files containing matching lines when      
           searching multiple files                                          
   -n      precede each line with its line number within file                
   -v      print all lines not containing pattern                            

 Line Editing Files with sed and tr
The sed program in UNIX is a command-line text editor. It differs from other UNIX text editors like pico or vi because it is a non-interactive editor. This essentially means that the contents of a file can be edited without making any changes to the original file. This is possible because sed is a stream-oriented editor. Sed operates by executing a script on a stream of data (typically the contents of a file) as it passes through the program. By default, the program's output goes directly to the screen but, should you want to save your changes, the output can also be directed to a file. The sed program is typically used in one of two ways.
The first example below shows sed executing a command on the input specified by the file name and returning the edited output to the screen.
sed [-flags] `command' file(s)
The second example shows sed executing a series of commands, as specified in the scriptfile, and redirecting the output to a new file.
sed [-flags] -f scriptfile file(s) > newfile
Let's look at some of the sed commands.
sed Commands
There are numerous editing commands available to the sed program. The following table provides a brief description of some of the more common ones. A more complete description of the sed editing commands is available from the sed manual pages.
 Command   Usage                                Action                        
    =      [/regexp /]=                         prints line number of each    
                                                line containing regexp        
    a      [address ]a\ text                    appends text  following       
                                                address .  The address        
                                                value can be a line number,   
                                                the $ symbol (for last        
                                                line), or a regular           
                                                expression enclosed in        
                                                slashes (/pattern/)           
    i      [address ]i\ text                    inserts text  before address  
    c      [address ][,address2 ]c\ text        replaces addressed block      
                                                with text                     
    d      [address ][,address2 ]d              deletes addressed lines       
    s      [address ][,address2 ]s/regexp       substitutes regexp2  for      
           /regexp2 /[flags]                    regexp  on lines addressed.   

Some sed Examples
One of the most useful applications of the sed command is its ability to find and replace one regular expression for another. Let's say we have a file with multiple occurrences of the regular expression "smilie" and we want to replace them with the expression "smiley." We can try the following command to accomplish this task:
sed `s/smilie/smiley/g' smilies.txt > smileys.txt
Let's examine the section in single quotes first. The initial "s" tells the sed command to perform the substitute command. The first expression between the slashes is the text string to be removed. The second expression between the slashes is the text string to be inserted. The remaining information on that line tells the sed command what file to edit (smilies.txt) and to redirect (>) the output to a file called smileys.txt.
Controlling what gets substituted
By default, the s editing command only replaces the first occurrence of the specified regular expression on each line. Additional commands can be included to tell sed what to substitute. In the above example, the trailing g in the single quotes tells sed to perform the substitution for every occurrence of the string in the file. A number n (any value from 1 through 512) can also be used in place of the g to have sed only substitute the nth occurrence of the string on each line.
Using a scriptfile with sed
Normally the sed command expects to see a single edit command specified on the command line (as above). If you wish to perform a series of edits on a particular stream of data, you can automate the series of edits by creating a script. A sed script is a text file with a sed editing command listed on each line of the file. If you specify the -f flag, followed by the name of the scriptfile, sed will edit the file one line at a time, executing each command in the scriptfile where necessary. Suppose you want to delete the first fifteen lines of a file and print the phrase "first paragraph deleted" on the sixteenth line. To do so you could create a scriptfile (we'll call our file script) with the following information stored in it:
1,15d
16i\
first paragraph deleted

Then issue the following command:
sed -f script smilies.txt > newsmilies.txt
This command will take the input file, smilies.txt, and perform the set of commands listed in the scriptfile called script. The output would then be saved in a file named newsmilies.txt
Stream-editing with tr
There is another stream-editing command besides sed called tr. This command is used to copy standard input to standard output while making substitutions or deletions on the data passed.
tr Syntax and Flags
The basic syntax for the tr command is as follows:
tr [-flag(s)] [string1 [string2 ]]
It is important to remember, however, that tr reads from stdin and writes to stdout. Thus, if you want to use it to manipulate files you will need to use pipes or redirection. There are not many flags for tr; the two important ones are -d, which deletes characters in string1 from the output, and -s, which "squeezes" repeated occurrences of string2 from the output. Consider some examples.
Examples of tr Usage
One common use of tr is to change case, i.e. make all uppercase characters into lowercase or vice versa. One way to change uppercase to lowercase in a file would be as follows:
cat filename | tr '[A-Z]' '[a-z]'
To save the output you would of course need to redirect from stdout. Another useful example is simply deleting all the occurrences of a given character from a file. Suppose you want to delete all the quote characters from a passage in a file called "excerpt" and save the output to "excerpt.revised". You would use the following:
% tr -d \" < excerpt > excerpt.revised
Using tr with od
One utility that works well with tr is od (octal dump). With od, you can discover the octal value of any character/cursor value in a given file. This information can then be used with tr. For example, the following file is formatted incorrectly:
This is the first line

This is the second line

With od, we can see that the problem is that there are consecutive line feeds (\n \n). The following od command gives us the octal characters for these lines:
od -bc double
0000000  124 150 151 163 040 151 163 040 164 150 145 040 146 151 162 163
           T   h   i   s        i   s      t   h   e       f   i   r  s

0000020  164 040 154 151 156 145 012 012 124 150 151 163 040 151 163 040
           t       l   i   n   e  \n  \n   T   h   i   s       i   s

0000040  164 150 145 040 163 145 143 157 156 144 040 154 151 156 145 012
           t   h   e       s   e   c   o   n   d       l   i   n   e \n

0000060  012 124 150 151 163 040 151 163 040 164 150 145 040 164 150 151
          \n
The flags used in the previous command are two of the more common flags used with the od command. The -b flag tells the od command to display the input bytes in octal format and the -c flag tells the command to display the bytes in ASCII. The two character formats can then be compared to determine the ASCII character and its corresponding octal value. With this information, the following tr command can be used to reformat the text:
tr -s "\012" "" < double
This command will "squeeze" out any repeated appearances of the line break (octal value is 012). Nothing is inserted in its place because the second string in the command has no value in the double quotes. The output looks like this:
This is the first line
This is the second line
 Parsing Files With awk
Awk is a pattern-matching program designed for parsing and manipulating files, especially when the files are databases. There are multiple versions of awk available: the original awk (awk), a new version of awk (nawk) with some added functionality, and the GNU version (gawk) that is essentially the same as nawk. Awk allows you to produce formatted reports from databases, use variables to change the database, and to perform arithmetic and string operations, and more.
Awk reads input files, one line at a time, by dividing lines into a series of separate fields. By default, awk defines a field as a sequence of characters that does not contain a space or a tab. Different field separators can be specified by including the -Fc flag where the character c denotes the field separator. Thus, a line with fields separated by colons (:) can be parsed in awk using the -F: flag. Once fields have been defined for a line of input, awk identifies the fields by assigning variable names to them. The first field in a line of input is called $1, the second $2, and so on. The entire line is named $0. The number of fields can vary line by line depending on the data.
Syntax for awk
The awk utility is similar to sed in that it can be invoked in one of two ways. The basic syntax for all versions of awk is as follows:
awk [-flag(s)] `script' var=value file(s)
awk [-flag(s)] -f scriptfile var=value file(s)
Awk, just as with sed, can be executed by specifying a command directly on the command line or by specifying (with the -f flag) a set of commands in a particular script file.
Some awk Examples
It isn't possible to give detailed descriptions of the awk variables and commands in this document because both sets are too large to adequately explain in this document. Rather, a few examples of data manipulation with awk will be provided to demonstrate its functionality. Look at the manual pages for additional information on the different awk utilities.
Let's say you want to find out which of the files in the current directory is larger than 50000 bytes and, once this has been determined, you want the owner name, file name, and file size (with the word "bytes" appended) printed to the screen.
ls -al | awk `$5 > 50000 {print $3, $9, $5, "bytes"}'
Here we're listing all of the files in the current directory and piping the output to awk. Awk first looks at the fifth field in each line of input to determine if the value is larger than 50000. If so, awk prints the value in the third field, the value in the ninth field, the value in the fifth field, and the word "bytes" to the screen.
Another example involves validating the values or format of a data set. For instance, let's say that you have a data set (let's call it "data") that should have six fields of data per line and the value in the third field should never be equal to or below zero. You can create an error reporting script (let's call it error.script) that has these commands:
NF != 6 {print NR, "Number of fields is not equal to 6"}
$3 <= 0 {print NR, "Invalid value in field 3"}
The first of these lines in the script counts the number of fields per line, returning the line number (NR is the awk system variable for record number) and an error message if the value is not equal to 6. The second line looks at the value in the third field and returns the line number and an error message if the value is equal to or less than 0. If there are no errors, awk returns no output. The command should read the awk procedures from the error.script file and apply them to the data file. It should look like this:
awk -f error.script data
Awk can also be used to compute values and more precisely control the appearance of the output. Let's say that we have a class roll with the student name appearing in the first field ($1), and the scores to two tests in the next two fields ($2 and $3). Awk can add the scores in each column and compute the class average. In this example the printf command will be used to force awk to display the scores with up to three digits before the decimal and two after it. The command should look like this:
awk -f average school
where "school" is the name of the data set and "average" is the name of the script file with the following commands:
{ test1=test1 + $2 }
{ test2=test2 + $3 }
END { printf("the class average for test1 is %3.2f\n", test1/NR)
printf("the class average for test2 is %3.2f\n", test2/NR) }
The first two bracketed lines tell awk to add up the values in the second and third fields and store them in variables named "test1" and "test2" respectively. The END statement separates these patterns from the procedures that will act upon the variables. The two print lines display text with a placeholder (%3.2f), telling awk how to format the number that will appear there. Both of the placeholders are followed by the line break characters (\n) to display the print statements on separate lines. Finally, we see the two variables divided by the NR system variable that counts the total number of lines.
Again, these are just a few of the many operations that can be performed by awk. To learn more, read the manual pages for awk or consult a general UNIX utilities resource for additional uses.
 Comparing Files with diff
The diff utility compares two files and reports differences that exist between the two files. Diff prints out differing pairs of lines in the two files and provides codes to identify the changes necessary to make the lines identical.
Basic Syntax for diff
The diff utility adheres to the following syntax:
diff [-flag(s)] file1 file2
When discrepancies are found, diff prints the lines from each file, flagging the file1 line with the < symbol and the file2 line with the > symbol. A good example of the diff utility in action can be seen in comparing the two files (smilies.txt and smileys.txt) we used with the sed command. These files are identical in all ways except for the use of "smiley" in one file and "smilie" in the other. Comparing these two files with diff should generate numerous occurrences of differing content. Here's an example of the output:
9c9
< :-) Your Basic smilie. This smilie is used to inflect a
---
> :-) Your Basic smiley. This smiley is used to inflect a
The first line from this sample of the output tells us that these lines appeared in line 9 of each file. Again, the line with the initial < symbol is from file1 and the one with the initial > symbol is from file2. The two lines are separated by a line with three dashes.
Normal flags recognized by diff include the following:
 Flag    Effect                                                           
  -b     ignore sequences of spaces (treat as one space) and spaces at    
         the end of lines                                                 
  -e     produces an ed script to convert file1  to file2                 
  -i     ignore upper/lowercase distinctions                              
  -w     similar to -b; ignores all space and tab characters              

Typically, the diff utility is used to compare files but it can also be used to compare directories or files and directories. To do so, one would simply need to supply the names of directories in place of file names. Two useful flags when comparing directories are -r to run diff recursively, comparing files in any common subdirectories, and -s to report files that are identical in the two directories. Finally, if you give diff the name of a directory and the name of a file as the two arguments to be compared, diff will look in the directory for a file that corresponds with the other argument. Thus, executing the command diff newdir file is the same as diff newdir/file file.
 Cutting and Pasting in UNIX
Users with experience on microcomputers are probably familiar with the concepts of cutting and pasting. But cutting and pasting under UNIX works differently than under Windows/MacOS operating systems.
The cut Command
The cut command allows a user to extract columns from one or more files. The columns to be extracted can be specified as either character-width columns or as delimited fields. The basic syntax for a cut is this:
cut -ccolumnlist | -ffieldlist [flag(s)] file1 [file2...]
The columnlist and fieldlist specifications (only one of the two may be used) are lists of integers specifying which columns or fields to extract. Lists of nonconsecutive numbers must be separated by commas, and sequences can be represented by a hyphen. In addition, cut recognizes two other flags. To specify a field delimiter other than the default tab, use the -d character. For example, if the fields in the input line were delimited by colons, you would want to use the cut utility with the "-d:" flag. If there are lines which do not contain the delimiter (whether user-specified or the default), you can suppress their output by using the -s flag.
Let's assume that we have a database file, named "phone_list," with three fields delimited by colons rather than tabs. The three fields contain a name, a city, and a phone number. Each line in the database looks like this:
Louis Prima:San Antonio:514.8814
To cut the name and phone number fields from this file and redirect the output into a new file, try this:
cut -d: -f1,3 phone_list > newlist
A line in the new file would look like this:
Louis Prima:514.8814
Note: If the fields in the previous example are delimited by spaces rather than colons, single quotes around a blank space must be used with the -d flag like this: cut -d' ` -f1,3 phone_list > newlist. Without the single quotes, the cut utility does not recognize the space as a field delimiter.
The Paste Command
The paste command is used to merge files into columns. For example, if you had two files, one a list of names and the other a list of addresses, both in the same order, you could use the paste command to create a third file containing two columns. The default character used to separate columns is tab (press the <TAB> key once).
paste Syntax
The basic syntax for paste is as follows:
paste [flag(s)] file [file2...]
Output is sent to standard output, so saving will require redirection. Each file named on the command line becomes a column in the output. The hyphen (-) can also be used as a filename to denote the standard input. paste recognizes the following flags:
 Flag    Effect                                                           
  -d     using -d'character ' will separate columns with character        
         instead of a tab.  If more than character is specified, then     
         the first character will be used between the first and second    
         columns, the second character between the second and third       
         columns, and so on.  Use the escape sequences \n for newline     
         and \t for a tab                                                 
  -s     merges subsequent lines from the same file                       

 Using join to Merge Files
Another useful utility for merging files is join. This command lets a user take two files and merge records. This assumes that both files contain records composed of delimited fields and that both files are sorted by the first field. By default, join examines lines and compares the first field. When a match is found, join prints out the common field and the remainder of each record in the order in which the files were specified on the command line. Suppose you have two files called "numbers" and "cities". The first file contains:
Jay 555.4435
Maria 555.2398
Henry 877.1959
while the second file contains:
Jay Chicago
Maria Miami
Henry Modesto
The output of the command join numbers cities would be:
Jay 555-4435 Chicago
Maria 555-2398 Miami
Henry 877.1959 Modesto
Basic syntax for join
The syntax for join is:
join [flag(s)] file1 file2
The hyphen (-) can be used in place of file1 to read from standard input. The join command recognizes the following flags:
 Flag    Effect                                                                  
-a[n ]   list any unpaired lines from file n  (n  should be either 1 or 2) or
         both if n  is not specified                                             
-es      replaces empty fields in output with string s  (e.g. "N/A" or
         "unknown")                                                              
-j[n ]m  use field m  for matching lines.  If n  (1 or 2) is specified, use
         field m  of file n                                                      
-olist   format output lines according to list.  Each item in list  is of the
         form n.m , where n  is 1 (for file1) or 2 (for file2) and m  is the
         number of a field in file n .  The common field is suppressed unless
         specified in list .                                                     
-tc      set default input and output field separator to character c .
         Default is whitespace (as in awk)                                       

Examples of join usage
Using the two files we listed earlier, suppose we want to have the phone number appear first, followed by the city and then the name. We could use the -o option to specify the field order in the output as follows:
% join -o 1.2 2.2 1.1 numbers cities
which gives us this output:
555-4435 Chicago Jay
555-2398 Miami Maria
877.1959 Modesto Henry
 Controlling Flow with Loops
The UNIX computing environment also supports the creation and use of shell scripts to automate repetitive tasks and to simplify the use of frequently used commands. A shell script is simply a text file containing commands that are interpreted by the shell.
Shell Interpreters
The shells in UNIX are similar but they all have their own specific language. This document focuses on the ksh, our default shell is tcsh. Most UNIX users agree that Bourne shell derivatives (sh, ksh, bash) are superior for scripting. Fortunately, this is not a problem, you can simply invoke ksh (or any other shell) in the first line of your shell script. To do so, specify the name of the shell and its path, preceded by the #! symbols. For instance, to write a shell script using the ksh interpreter, use the following syntax in the script's first line:
#!/usr/bin/ksh
To find out the location of the shells available on a UNIX system, use the command ``cat /etc/shells'' (which lists the file /etc/shells to stdout). Hint: most shell interpreters will be located in the /usr/bin/ directory.
Running a shell script
Before you can run the script you created in your favorite UNIX text editor (pico, emacs, vi, etc.) you need to give the file executable permissions. To do so, you can use the chmod command in this fashion:
chmod +x filename
Once you have given the script the appropriate permissions, you need to decide how you want to run the script. You can run the script by simply typing the name of the script at the command line. This, of course, assumes that the script is located in a directory that is part of your command search path (i.e., your PATH variable). If not, the shell will return an error stating that it could not find the specified command (your script). To avoid this error, specify the absolute path to the script at the command line or, if the script is in your current directory, you can type ./scriptname to tell the shell to find your script in the current directory. With this general introduction to the creation of shell scripts, let's look at a couple of examples to better understand their use.
Using the "for" loop
The for loop is one of the more simple loops used in shell scripting. It allows you to repeat a series of commands on a set number of files or parts of files. It's distinct, however, from for loops in some other programming languages because it doesn't allow you to specify how many times to iterate through the loop. Rather, other constructs such as while or if are needed to control the number of iterations made by the loop.
Basic syntax
For loops in the ksh follow this basic syntax:
for name [in list]
do
statements to apply to $name
done
An example of a "for" loop
Let's suppose we have a database file that we use to store all of the entries in our addressbook. For the sake of this example, let's say that each line in the database file contains a person's name, phone number, and email address, all separated by colons. The addressbook file looks like this:
Angus Young:933-1865:hank@hostname.usask.ca
Brian May:942-9823:maggie@somewhere.usask.ca
Eric Clapton:967-2257:skip@onemore.usask.ca
Now we want to write a shell script that will mail a message to each email address that we have in that file. The script needs to cycle through that file, strip out the address, and create a mail message for each address it finds. To do so, we can create a script like this:
#!/usr/bin/ksh

for user in $(cut -d: -f3 filename)
do
echo $user
mail $user < message
done
Now let's break this script down line by line to better understand it.
The first line establishes the interpreter to use for the script: the ksh. This declaration should list the absolute path to the shell interpreter, preceded by the "#!" characters.
The second line does a number of things. The first word "for" initiates the loop. The next word "user" is the variable name that will be used later to perform loop's actions. The remainder of the line "in $(cut -d: -f3 filename)" declares the list of items to be acted on by the loop. We need to use the cut utility, described earlier in this document, to retrieve the email addresses from the file and omit the extraneous information.
The third line of this shell script (the word "do") initiates the actions to be performed on our list of addresses.
The fourth line tells the shell to print each of the email addresses to the screen. The shell will do this because the "$user" notation stands for each instance of the variable in the list. The only reason to print the email addresses to the screen is to show you that the shell script is working its way through the loop.
The fifth line shows where the mail messages are actually created. Here, the mail function is being used by the shell. It is creating a mail message for each address listed as $user and is redirecting the text of the file named message for the body of our email message.
The final line of the script (the word "done") signals that the actions of the loop are finished. Once the loop has iterated through the list (all of the instances of $user), the script is finished and you will see a UNIX command line prompt again.
Making the script more portable
We can make the above script more portable by substituting the filenames with command-line variables. Currently, this little mailer script that we've written will only use the files that are named in the script. We can add the ability to specify these file names at the command-line by including positional parameters in the script. Positional parameters are denoted by the syntax, $1, $2, $3, and so on. The shell assigns the first positional parameter to the first argument listed on the command-line. The $0 positional parameter can be used to refer to the command used to invoke the name of the script (usually this will be the name of the script). To add this functionality, two subtle changes need to be made to the script:
#!/usr/bin/ksh

for user in $(cut -d: -f3 $1)do
echo $user
mail $user < $2
done
We've replaced the name of the addressbook file and the message with the $1 and $2 positional parameters, respectively. To execute this script, we would type the following at the command-line:
$ scriptname filename1 filename2
where scriptname is the name of the shell script, filename1 is the name of the addressbook file, and filename2 is the name of the message to be redirected into the mail message.
 A Final Note
Many of the commands described in this handout are documented more fully in the online manual pages. Use the man command to get more complete descriptions of them, especially for commands like sed and awk. In addition, there are several excellent books available which cover UNIX command syntax and usage.