Being able to count the number of occurrences of characters or words in text is a handy trick. Fortunately this is very easy to do in awk with the gsub() function.

The syntax for using gsub() looks like this:
gsub(regexp, replacement [, target])

gsub() will search target for substrings matching the provided regular expression and replace these substrings with replacement. Awk will alter the value of target and return the number of substitutions made.

The target parameter in gsub() is generally a specific field or an entire record. If target is omitted, then the entire line ($0) is used. For the purpose of counting the occurrences of a substring in text, we don’t need the updated value of the target. This makes the value of the replacement parameter is irrelevant.

Since gsub() returns the number of substitutions made, we can simply use this number to indicate the number of occurrences of characters/substrings in in each record.

Here are a few examples:


#count occurrences of letter 's'
> echo "this is a simple awk example" | awk '{print gsub(/s/, "")}' 

#count occurrences of 'abc'
> echo "abc xyz abc" | awk '{print gsub(/abc/, "")}'

#count occurrences of 'a' in first field
> echo "aaa xyz abc" | awk '{print gsub(/abc/, "", $1)}'

It is possible to place your regular expressions between "" as well as //. This makes it convenient when trying to count or replace " or / characters.

Leave a Reply

Count Number of Occurrences of Characters in Line with AWK