UNIX Commands: Working with Text Files

By Gordon Davisson

Copyright (c) 2002, Westwind Computing inc.

more and less
– display the contents of a text file, one screenful at a time (hit the
spacebar to get the next screen). Note that this only works well with plain
text files, not Word files, RTF’s, PDF’s, or anything else that contains
formatting information. less also allows you to go backwards
(type “b”) in the file. In either one, type “h” for more detailed help.

Examples:

more /etc/inetd.config
print the inetd.conf file to the
terminal, one screen at a time.
ps -ax | more
use the ps
command to generate a list of processes running on the system, and
pipe them to more to display them
one screen at a time.

grep
– search the contents of a text file, and print lines containing a
given word or pattern.

Examples:

grep telnet /etc/inetd.config
search the inetd.conf file,
and print all lines that contain “telnet”.
grep diskarbitrationd /var/log/system.log
search the main
system log for entries that mention the disk arbitration daemon.
ps -ax | grep netinfod
use the ps
command to generate a list of processes running on the system, then
pipe the list through grep, which will
print only those lines containing “netinfod”. Note: this will list all runing
netinfod processes, and also list the process running grep itself.

pico
– edit the contents of a text file. As with more and
less, this only really works with plain text files. Also, note that
pico doesn’t have menus or command-key (cloverleaf, Apple-key, whatever you
call it) shortcuts, or pay any attention to what you do with the mouse.
Instead, you use the arrow keys to move around, and control-key shortcuts to
do things you’d normally do with the mouse and menus. But, the control-key
shortcuts have no connection to the command-key shortcuts you’re used to;
for example, control-X means quit (and ask if you want to save changes), not
cut. There’s a list of control-key’s at the bottom of the window, and
control-G will give you more extensive help. Note, though, that the F-key
equivalents it lists for some commands don’t work.

vi and emacs
– other text editors provided with the standard OS X installation. They’re
both more powerful than pico, but also a lot harder to figure out
if you aren’t already familiar with them.

tail
– print the last few lines of a text file. This is mainly useful for
examining the last (i.e. most recent) entries in things like log files.

Examples:

tail /var/log/system.log
print the last screenful of
entries from the main system log.
tail -1000 /var/log/system.log | more
print the last 1000
entries from the main system log, using more to
display them one screenful at a time.
tail -f /var/log/system.log
print the last screenful of
entries from the main system log, then “follow” changes to file, printing
new log entries as they’re made.

Text File Format Compatibility:

As simple and straightforward as they may seem, text files still harbor an
opportunity for compatibility problems. Different operating systems have
traditionally used different ways to indicate line endings (aka line breaks).
Mac OS has traditionally used the Carriage Return character (ASCII chcracter
13, aka CR or ^M) to indicate line breaks; unix has traditionally used the
Line Feed character (ASCII 10, aka LF or ^J). Since Mac OS X derives from
both heritages, it winds up using a mix of the two in various contexts. But
most command line utilites only understand (and produce) files with unix-style
breaks.

Just to make things even more fun, there’s actually a third variant: MS-DOS
its successors use a carriage return followed by a line feed to indicate a line
break. Few Macintosh programs will generate such files, but if you need to deal
with a file that came from a PC, you’ll probably want to convert it to a more
native format on the Mac.

Fortunately, it’s fairly easy to convert the formats back and forth on the
command line. Here are some examples of how to transform files back and forth:

tr '\r' '\n' <macfile.txt >unixfile.txt
convert
the Mac-format file macfile.txt to unix format, and save the result as
unixfile.txt. tr is a program that does character substitution,
and in this case it’s simply being used to replace CR (written \r on the
command line) with LF (written \n) throughout the file.
tr '\r' '\n' <macfile.txt | grep fnord
convert the
Mac-format file macfile.txt to unix format, then use
grep to search the file for the word “fnord”.
(Note: since grep doesn’t understand Mac-style
line breaks, searching the file without converting it first would probably
not give useful results.)
tr '\n' '\r' <unixfile.txt >macfile.txt
convert
the unix-format file unixfile.txt to Mac format, and save the result as
macfile.txt.
perl -p -e 's/\r/\n/g' macfile.txt >unixfile.txt
convert
the Mac-format file macfile.txt to unix format, and save the result as
unixfile.txt. This is functionally identical to the first example, but since
perl is actually a very general programming language, it can also do some
other useful things… BTW, he -e means the program will be the next thing
on the command line ('s/\r/\n/g' – perlese for replace all \r’s with
\n’s), and the -p means do this for each line of the file.
perl -pi -e 's/\r/\n/g' textfile.txt
convert the file
textfile.txt from Mac-style (CR) line breaks to unix-style (LF), and replace
the original file with the converted version (that’s what the -i means).
perl -pi -e 's/\r\n?/\n/g' textfile.txt
convert the file
textfile.txt from Mac-style (CR) or PC-style (CRLF) line breaks to
unix-style (LF), and replace the original file.
perl -pi -e 's/\r\n?/\n/g' *.txt
convert all text files
(or rather, files with .txt extensions) in the current directory to unix-style
breaks. Note that any that were already in unix format will not be changed.
perl -pi -e 's/\n/\r/g' textfile.txt
convert the file
textfile.txt from unix-style (LF) line breaks to Mac-style (CR), and replace
the original file.

Leave a Reply