Tuesday, August 28, 2012

Handling Files in UNIX

Working with Files

In UNIX there are three basic types of files:
  • ·         Ordinary Files
  • ·         Directories
  • ·         Special Files


An ordinary file is a file on the system that contains data, text, or program instructions. In this Article, you look at working with ordinary files.
Directories, covered in Chapter 4, "Working with Directories," store both special and Ordinary files.
For users familiar with Windows or Mac OS, UNIX directories are equivalent to folders.
Special files are covered in Chapter 5, "Manipulating File Attributes." Some special files provide access to hardware such as hard drives, CD-ROM drives, modems, and Ethernet adapters. Other special files are similar to aliases or shortcuts and enable you to access a single file using different names.

Listing Files
First, list the files and directories stored in the current directory. Use the following command:

$ ls

Here's a sample directory listing:

bin hosts lib res.03
ch07 hw1 pub test_results
ch07.bak hw2 res.01 users
docs hw3 res.02 work

This output indicates that several items are in the current directory, but this output does not tell us whether these items are files or directories. To find out which of the items are files and which are directories, specify the -F option to ls:

$ ls -F

Now the output for the directory is slightly different:

bin/ hosts lib/ res.03
ch07 hw1 pub/ test_results
ch07.bak hw2 res.01 users
docs/ hw3 res.02 work/

As you can see, some of the items now have a / at the end: each of these items is a directory. The other items, such as hw1, have no character appended to them. This indicates that they are ordinary files. When the -F option is specified to ls, it appends a character indicating the file type of each of the items it lists. The exact character depends on your version of ls. For ordinary files, no character is appended. For special files, a character such as !, @, or # is appended to the filename. For more information on the exact characters your version of ls appends to the end of a filename when the -
F option is specified, please check the UNIX manual page for the ls command. You can do this as follows:

$ man ls

So far, you have seen ls list more than one file on a line. Although this is fine for humans reading the output, it is hard to manipulate in a shell script. Shell scripts are geared toward dealing with lines of text, not the individual words on a line. Without using external tools, such as the awk language covered in Chapter 17, "Filtering Text Using awk," it is hard to deal with the words on a line.
In a shell script it is much easier to manipulate the output when each file is listed on a separate line.
Fortunately ls supports the -1 option to do this. For example,

$ ls -1

produces the following listing:

bin
ch07
ch07.bak
docs
hosts
hw1
hw2
hw3
lib
pub
res.01
res.02
res.03
test_results
users
work

Hidden Files
So far you have used ls to list visible files and directories, but ls can also list invisible or hidden files and directories. An invisible file is one whose first character is the dot or period character ( .). UNIX programs (including the shell) use most of these files to store configuration information. Some common examples of hidden files include the files

  • Ø  .profile, the Bourne shell ( sh) initialization script
  • Ø  .kshrc, the Korn shell ( ksh) initialization script
  • Ø  .cshrc, the C shell ( csh) initialization script
  • Ø  .rhosts, the remote shell configuration file

All files that do not start with the . character are considered visible.
To list invisible files, specify the -a option to ls:

$ ls -a

The directory listing now looks like this:

. .profile docs lib test_results
.. .rhosts hosts pub users
.emacs bin hw1 res.01 work
.exrc ch07 hw2 res.02
.kshrc ch07.bak hw3 res.03

As you can see, this directory contains many invisible files. Notice that in this output, the file type information is missing. To get the file type information, specify the –F and the -a options as follows:

$ ls -a -F

The output changes to the following:

./ .profile docs/ lib/ test_results
../ .rhosts hosts pub/ users
.emacs bin/ hw1 res.01 work/
.exrc ch07 hw2 res.02
.kshrc ch07.bak hw3 res.03

With the file type information you see that there are two hidden directories (. and ..). These two directories are special entries that are present in all directories. The first one, ., represents the current directory. The second one, .., represents the parent directory. We discuss these concepts in greater detail in section "TheDirectory Tree" of Chapter 4.

Option Grouping
In the previous example, the command that you used specified the options to ls separately. These options can also be grouped together. For example, the commands

$ ls -aF
$ ls -Fa

are the same as the command

$ ls -a -F
As you can see, the order of the options does not matter to ls. As an example of option grouping, consider the equivalent following commands:

ls -1 -a -F
ls -1aF
ls -a1F
ls -Fa1

Any combination of the options -1, -a, and -F produces identical output:

./
../
.emacs
.exrc
.kshrc
.profile
.rhosts
bin/
ch07
ch07.bak
docs/
hosts
hw1
hw2
hw3
lib/
pub/
res.01
res.02
res.03
test_results
users
work/

Viewing the Content of a File
The ability to list files is very important, but shell scripts also need to be able to view the contents of a file.

cat

To view the content of a file, use the cat (short for concatenate) command. Its syntax is as follows:

cat files
Here files are the names of the files that you want to view. For example,

$ cat hosts

prints out the contents of a file called hosts:

127.0.0.1 localhost loopback
10.8.11.2 kanchi.bosland.us kanchi
10.8.11.9 kashi.bosland.us kashi
128.32.43.52 soda.berkeley.edu soda

You can specify more than one file as follows:

$ cat hosts users

If the users file contains a list of users, this produces the following output:

127.0.0.1 localhost loopback
10.8.11.2 kanchi.bosland.us kanchi
10.8.11.9 kashi.bosland.us kashi
128.32.43.52 soda.berkeley.edu soda
ranga
sveerara
vathsa
amma

Numbering Lines
The cat command also understands several options. One of these is the -n option, which
numbers the output lines. You can use it as follows:

$ cat -n hosts

This produces the output

1 127.0.0.1 localhost loopback
2 10.8.11.2 kanchi.bosland.us kanchi
3 10.8.11.9 kashi.bosland.us kashi
4 128.32.43.52 soda.berkeley.edu soda
5

The numbered output shows us that the last line in this file is blank. You can ask cat to skip numbering blank lines using the -b option:

$ cat -b hosts

In this case the output looks like the following:
1 127.0.0.1 localhost loopback
2 10.8.11.2 kanchi.bosland.us kanchi
3 10.8.11.9 kashi.bosland.us kashi
4 128.32.43.52 soda.berkeley.edu soda

Although the blank line is still there, it is no longer numbered.

Counting Words ( wc)
Now that you know how to view the contents of a file, look at how to get some information about the contents.
You can use the wc command to get a count of the total number of lines, words, and characters contained in a file. The basic syntax of this command is

wc [options] files

Here options are one or more of the options given in Table 3.1 and files are the files you want examined.
If no options are specified, the output contains a summary of the number of lines, words, and characters. For example, the command

$ wc .rhosts

produces the following output for my .rhosts file:

7 14 179 .rhosts

The first number, in this case 7, is the number of lines in the file. The second number, in this case 14, is the number of words in the file. The third number, in this case 179, is the number of characters in the file.
Finally, the filename is listed. The filename is important if more than one file is specified.
If you specify more than one file, wc gives the individual counts along with a total. For example, the command

$ wc .rhosts .profile

produces the following output:

7 14 179 .rhosts
133 405 2908 .profile
140 419 3087 total

You can also use wc to get the individual counts as shown in the next sections. The options covered in these

Option
Description
-l
Counts the number of lines
-w
Counts the number of words
-m or -c
Counts the number of characters

The -m option is available on Solaris and HP-UX. It is not available on Linux. On Linux systems, you need to use the -c option instead.

Number of Lines
To count the number of lines, use the -l ( l as in lines) option. For example, the command

$ wc -l .profile

produces the output

133 .profile

Number of Words
To count the number of words in a file, use the -w ( w as in words) option. For example, the command

$ wc -w .rhosts

produces the output

14 .rhosts

which is what you expected.

Number of Characters
To count the number of characters, use either the -m option or the -c option. As mentioned, the -m option should be used on Solaris and HP-UX. The -c option should be used on Linux systems.
For example, the command

$ wc -m .profile

produces the output

2908 .profile

In Linux or GNU, the equivalent command is

$ wc -c .profile

Combining Options
Like the ls command, the options to wc can be grouped together and given in any order.
For example, if you wanted a count of the number of words and characters in the file test_results you can use any of the following commands:

$ wc -w -m test_results
$ wc -wm test_results
$ wc -mw test_results

The output from each of these commands is identical:

606 3768 test_results

The output lists the words in the files first, the number of characters in the file, and the name of the file. In this case, there are 606 words and 3,768 characters in the file test_results.

Manipulating Files
In the preceding sections, you looked at listing files and viewing their content. In this section you look at the following methods of manipulating files:
  • §  Copying files
  • §  Renaming files
  • §  Removing files


Copying Files ( cp)
To make a copy of a file use the cp command. The basic syntax of the command is

cp source destination

Here source is the name of the file that is copied and destination is the name of the copy. For example, the following command makes a copy of the file test_results and places the copy in a file named
test_results.orig:
$ cp test_results test_results.orig

Common Errors
There is no output from the cp command, unless it encounters an error. Two common errors occur when
  • Ø  The source is a directory
  • Ø  The source does not exists

An example of the first case is the command

$ cp work docs

This causes an error message similar to the following:

cp: work: is a directory

An example of the second case is the command

$ cp test_relsuts test_results.orig

Here I have mistyped the filename test_results as test_relsuts and cp gives the following error:

cp: cannot access test_relsuts: No such file or directory

Interactive Mode
No error message is generated if the destination already exists. In this case, the destination file is automatically overwritten. This can lead to serious problems.
To avoid this behavior you can specify the -i ( i as in interactive) options to cp.
If the file test_results.orig exists, the command

$ cp -i test_results test_results.orig

results in a prompt something like the following:

overwrite test_results.orig? (y/n)

If you choose y (yes), the file will is overwritten. If you choose n (no), the file test_results.orig isn't changed.

Copying Files to a Different Directory
If the destination is a directory, the copy has the same name as the source but is located in the destination directory. For example, the command

$ cp test_results work/

places a copy of the file test_results in the directory work.

Multiple Inputs
If more than two inputs are given, cp treats the last argument as the destination and the other files as sources. This works only if the sources are files and the destination is a directory, as in the following example:

$ cp res.01 res.02 res.03 work/

If one or more of the sources are directories the following error message is produced. For example, the command

$ cp res.01 work/ docs/ pub/

produces the following error:

cp: work: is a directory
cp: docs: is a directory

Although cp reports errors, the source file, in this case res.01, is correctly copied to the directory pub.If the destination is a file, but multiple inputs are given, as in the following example,

$ cp hw1 hw2 hw3

an error message similar to the following

cp: hw3: No such file or directory

is generated. In this case no files are copied.

Renaming Files ( mv)
To change the name of a file use the mv command. Its basic syntax is

mv source destination

Here source is the original name of the file and destination is the new name of the file. As an example,

$ mv test_result test_result.orig

changes the name of the file test_result to test_result.orig. A new file called test_result.orig is not produced like in cp; only the name of the file is changed. There is no output
from mv if the name change is successful.
If the source does not exist, as in the following example,

$ mv test_reslut test_result.orig

an error similar to the following is reported:

mv: test_reslut: cannot access: No such file or directory

Interactive Mode
Like cp, mv does not report an error if the destination already exists: it simply overwrites the file. to avoid this problem you can specify the -i option.
For example, if the file ch07.bak already exists, the following command

$ mv -i ch07 ch07.bak

results in a confirmation prompt:

remove ch07.bak? (n/y)

If you choose n (no), the destination file is not touched. If you choose y (yes), the destination file is removed and the source file is renamed.
The actual prompt varies between the different versions of UNIX.

Removing Files ( rm)
To remove files use the rm command. The syntax is

rm files

Here files is a list of one or more files to remove. For example, the command

$ rm res.01 res.02

removes the files res.01 and res.02.

Common Errors
The two most common errors using rm are
  • Ø  One of the specified files does not exist
  • Ø  One of the specified files is a directory

As an example of the first case, the command

$ rm res.01 res.02 res.03

produces an error message if the file res.02 does not exist:

rm: res.02 non-existent

The other two files are removed.
An example of the second case is the command

$ rm res.01 res.03 work/

This command produces another error message:

rm: work directory

The two files are removed.

Interactive Mode
Because there is no way to recover a file that has been deleted using rm, you can specify the -i option. In interactive mode, rm prompts you for every file that is requested for deletion. For example, the command

$ rm -i hw1 hw2 hw3

produces confirmation prompts similar to the following:

hw1: ? (n/y) y
hw2: ? (n/y) n
hw3: ? (n/y) y

In this case I answer y to deleting hw1 and hw3, but I answer n to deleting hw2.

Summary
In this chapter, we covered the following topics:
  • o   Listing files using ls
  • o   Viewing the content of a file using cat
  • o   Counting the words, lines, and characters in a file using wc
  • o   Copying files using cp
  • o   Renaming files using mv
  • o   Removing files using rm

Knowing how to perform each of these tasks is essential to becoming a good shell programmer. In the chapters ahead you use these basics to create scripts for solving real world problems.

Questions
1. What are invisible files? How do you use ls to list them?

2. Will there be any difference in the output of the following commands?
a. $ ls -a1
b. $ ls -1 -a
c. $ ls -1a

3. Which options should be specified to wc in order to count the number of lines and characters in a file?

4. Given that hw1, hw2, ch1, and ch2 are files and book and homework are directories, which of the following commands generates an error message?
a. $ cp hw1 ch2 homework
b. $ cp hw1 homework hw2 book
c. $ rm hw1 homework ch1
d. $ rm hw2 ch2

Terms
ls The command that lists the files in a directory.
cat The command that views the contents of a file.
wc The command that counts the words, lines, and characters in a file.
cp The command that copies files.
mv The command that renames files.
rm The command that removes files.

Ordinary File A file on the system that contains data, text, or program instructions.

Directories A type of file that stores other files. For users familiar with Windows or Mac OS, UNIX directories are equivalent to folders.

Invisible Files or Hidden Files Files whose names start with the . character. By default the ls command does not list these files. You can list them by specifying the -a option to ls.

Next -- Working With Directories