Thursday, April 12, 2007

Using bash and sed to Modify a Text File

This shell script demonstrates how to write to a text file, and then modify the contents.
#!/bin/sh
# modfile.sh
# by ScottM, 04/12/2007
# demonstrates writing text to a file, and then using sed to modify it.

TESTFILE=test.txt
FRUIT=banana

# add some content to the file (note: file will be overwritten)
echo "apple" > $TESTFILE

# modify the content
sed -e "s/apple/& $FRUIT/g" -i $TESTFILE

# sed uses the "s" option, which uses regular expressions to search and replace text
# "s/apple/" means search for any lines that contain the characters "apple"
# "& " means use the results of whatever was found
# "/& $FRUIT/g"  -- replace "apple" with "apple banana",
# the g is for global, or all lines containing the pattern

# output:
# $ cat test.txt
# apple banana
# $
Sed One Liner

This is really only a one line script, commonly referred to as a on-liner, so we don't really need a bash script, as long as we understand the regular expressions we are trying to use.

From the command line, we can insert a word:

$ sed -e 's/apple/& pear/g' test.txt
apple pear banana

Notice how the ampersand "&" character prints the text that was found.  Note that we left out the -i, so we can test the output before modifying the original.
Look at the difference here.  The word "pear" is either inserted or appended:

$ sed -e 's/\(apple\)/& pear/g' test.txt
apple pear banana

$ sed -e 's/\(apple.*\)/& pear/g' test.txt
apple banana pear

The parenthesis contain the search parameter that is printed by ampersand, but when we include ".*", we get apple followed by all characters up to the end of the line, and then we add a space and out new text:  " pear".

If we want to replace the entire line with the search string, plus some added text we could use the "^" to indicate start of line and $ to indicate end of line. In this case it would use whatever matches the search pattern, and ignore whatever else is on the line.

sed -e 's/^\(apple\).*$/\1 pear/g' test.txt
apple pear

So what happened to banana? We did a search for apple, and surrounded it with parenthesis. Then we asked to print \1 which is the first set of parenthesis (in this case the only set). That effectively erased everything else on the line except whatever matches apple.

Notice how you can print multiple search groups:

$ sed -e 's/^\(apple\)\(.*\)$/\1 pear \1\2/g' test.txt
apple pear apple banana

Exercises
Exercise to try: -- Where this might be useful is when replacing a URL in an html file. Search for href="something", and replace it with href="something-else"

Exercise 2 Try adding other words that match apple (e.g. apples, apple-pie,), and see what happens.

For more Regular Expression examples see our regex articles.

1 comment:

scottm said...

Here is a nice discussion of a simple text manipulation problem at unixforum.co.uk.

http://www.unixforum.co.uk/topic/32081-text-file-manipulation-adding-a-digittext-to-the-end-of-a-row/

This is a good example of how sed and awk can be used to edit a data file, so I posted a reply with some one-liners.