Tag Archives | shell scripts

Regex tutorial for Linux (Sed & AWK) examples

In order to successfully work with the Linux sed editor and the awk command in your shell scripts, you have to understand regular expressions or in short regex. Since there are many engines for regex, we will use the shell regex and see the bash power in working with regex. First, we need to understand what regex is, then we will see how to use it. For some people, when they see the regular expressions for the first time they said what are these ASCII pukes !! Well, A regular expression or regex, in general, is a pattern of text you define that a Linux program like sed or awk uses it to filter text. We saw some of those patterns when introducing basic Linux commands and saw how the ls command uses wildcard characters to filter output.

Continue Reading →

Types of regex

There are many different applications use different types of regex in Linux, like the regex included in programming languages (Java, Perl, Python,,,) and Linux programs like (sed, awk, grep,) and many other applications.

A regex pattern uses a regular expression engine which translates those patterns.

Linux has two regular expression engines:

  • The Basic Regular Expression (BRE) engine.
  • The Extended Regular Expression (ERE) engine.

Most Linux programs work well with BRE engine specifications, but some tools like sed understand some of the BRE engine rules.

The POSIX ERE engine is shipped with some programming languages. It provides more patterns like matching digits, and words. The awk command uses the ERE engine to process its regular expression patterns.

Since there are many regex implementations, it’s difficult to write patterns that work on all engines. Hence, we will focus on the most commonly found regex and demonstrate how to use it in the sed and awk.

Define BRE Patterns

You can define a pattern to match text like this:

echo "Testing regex using sed" | sed -n '/regex/p'

echo "Testing regex using awk" | awk '/regex/{print $0}'

Linux regex tutorial

You may notice that the regex doesn’t care where the pattern occurs or how many times in the data stream.

The first rule to know is that regular expression patterns are case sensitive.

echo "Welcome to LikeGeeks" | awk '/Geeks/{print $0}'

echo "Welcome to Likegeeks" | awk '/Geeks/{print $0}'

regex character case

The first regex succeeds because the word “Geeks” exists in the upper case, while the second line fails because it uses small letters.

You can use spaces or numbers in your pattern like this:

echo "Testing regex 2 again" | awk '/regex 2/{print $0}'

space character

Special Characters

regex patterns use some special characters. And you can’t include them in your patterns and if you do so, you won’t get the expected result.

These special characters are recognized by regex:

.*[]^${}\+?|()

You need to escape these special characters using the backslash character (\).

For example, if you want to match a dollar sign ($), escape it with a backslash character like this:

cat myfile

There is 10$ on my pocket

awk '/\$/{print $0}' myfile

dollar sign

If you need to match the backslash (\) itself, you need to escape it like this:

echo "\ is a special character" | awk '/\\/{print $0}'

special character

Despite the forward slash isn’t a special character, you still get an error if you use it directly.

echo "3 / 2" | awk '///{print $0}'

regex slash

So you need to escape it like this:

echo "3 / 2" | awk '/\//{print $0}'

escape slash

Anchor Characters

To locate the beginning of a line in a text, use the caret character (^).

You can use it like this:

echo "welcome to likegeeks website" | awk '/^likegeeks/{print $0}'

echo "likegeeks website" | awk '/^likegeeks/{print $0}'

anchor begin character

The caret character (^) matches the start of text:

awk '/^this/{print $0}' myfile

caret anchor

What if you use it in the middle of the text?

echo "This ^ caret is printed as it is" | sed -n '/s ^/p'

caret character

It’s printed as it is like a normal character.

When using awk, you have to escape it like this:

echo "This ^ is a test" | awk '/s \^/{print $0}'

escape caret

This is about looking at the beginning of the text, what about looking at the end?

The dollar sign ($) checks for the end a line:

echo "Testing regex again" | awk '/again$/{print $0}'

end anchor

You can use both the caret and dollar sign on the same line like this:

cat myfile
this is a test
This is another test
And this is one more

awk '/^this is a test$/{print $0}' myfile

combine anchors

As you can see, it prints only the line that has the matching pattern only.

You can filter blank lines with the following pattern:

awk '!/^$/{print $0}' myfile

Here we introduce the negation which is done by the exclamation mark !

The pattern searches for empty lines where nothing between the beginning and the end of the line and negates that to print only the lines have text.

The dot Character

The dot character is used to match any character except newline (\n).

Look at the following example to get the idea:

cat myfile
this is a test
This is another test
And this is one more
start with this

awk '/.st/{print $0}' myfile

dot character

You can see from the result that it prints only the first two lines because they contain the st pattern while the third line does not have that pattern and fourth line start with st so that also doesn’t match our pattern.

Character Classes

You can match any character with the dot special character, but what if you match a set of characters only, you can use a character class.

The character class matches a set of characters if any of them found, the pattern matches.

The chracter classis defined using square brackets [] like this:

awk '/[oi]th/{print $0}' myfile

character classes

Here we search for any th characters that have o character or i before it.

This comes handy when you are searching for words that may contain upper or lower case and you are not sure about that.

echo "testing regex" | awk '/[Tt]esting regex/{print $0}'

echo "Testing regex" | awk '/[Tt]esting regex/{print $0}'

upper and lower case

Of course, it is not limited to characters; you can use numbers or whatever you want. You can employ it as you want as long as you got the idea.

Negating Character Classes

What about searching for a character that is not in the character class?

To achieve that, precede the character class range with a caret like this:

awk '/[^oi]th/{print $0}' myfile

negate character classes

So anything is acceptable except o and i.

Using Ranges

To specify a range of characters, you can use the (-) symbol like this:

awk '/[e-p]st/{print $0}' myfile

regex ranges

This matches all characters between e and p then followed by st as shown.

You can also use ranges for numbers:

echo "123" | awk '/[0-9][0-9][0-9]/'

echo "12a" | awk '/[0-9][0-9][0-9]/'

number range

You can use multiple and separated ranges like this:

awk '/[a-fm-z]st/{print $0}' myfile

non-continuous range

The pattern here means from a to f, and m to z must appear before the st text.

echo "abc" | awk '/[[:alpha:]]/{print $0}'

echo "abc" | awk '/[[:digit:]]/{print $0}'

echo "abc123" | awk '/[[:digit:]]/{print $0}'

special character classes

The Asterisk

The asterisk means that the character must exist zero or more times.

echo "test" | awk '/tes*t/{print $0}'

echo "tessst" | awk '/tes*t/{print $0}'

asterisk

This pattern symbol is useful for checking misspelling or language variations.

echo "I like green color" | awk '/colou*r/{print $0}'

echo "I like green colour " | awk '/colou*r/{print $0}'

asterisk example

Here in these examples whether you type it color or colour it will match, because the asterisk means if the “u” character existed many times or zero time that will match.

To match any number of any character, you can use the dot with the asterisk like this:

awk '/this.*test/{print $0}' myfile

asterisk with dot

It doesn’t matter how many words between the words “this” and “test”, any line matches, will be printed.

You can use the asterisk character with the character class.

echo "st" | awk '/s[ae]*t/{print $0}'

echo "sat" | awk '/s[ae]*t/{print $0}'

echo "set" | awk '/s[ae]*t/{print $0}'

asterisk with character classes

All three examples match because the asterisk means if you find zero times or more any “a” character or “e” print it.

Extended Regular Expressions

The following are some of the patterns that belong to Posix ERE:

The question mark

The question mark means the previous character can exist once or none.

echo "tet" | awk '/tes?t/{print $0}'

echo "test" | awk '/tes?t/{print $0}'

echo "tesst" | awk '/tes?t/{print $0}'

question mark

The question mark can be used in combination with a character class:

echo "tst" | awk '/t[ae]?st/{print $0}'

echo "test" | awk '/t[ae]?st/{print $0}'

echo "tast" | awk '/t[ae]?st/{print $0}'

echo "taest" | awk '/t[ae]?st/{print $0}'

echo "teest" | awk '/t[ae]?st/{print $0}'

question mark with character classes

If any of the character class items exists, the pattern matching passes. Otherwise, the pattern will fail.

The Plus Sign

The plus sign means that the character before the plus sign should exist one or more times, but must exist once at least.

echo "test" | awk '/te+st/{print $0}'

echo "teest" | awk '/te+st/{print $0}'

echo "tst" | awk '/te+st/{print $0}'

plus sign

If the “e” character not found, it fails.

You can use it with character classes like this:

echo "tst" | awk '/t[ae]+st/{print $0}'

echo "test" | awk '/t[ae]+st/{print $0}'

echo "teast" | awk '/t[ae]+st/{print $0}'

echo "teeast" | awk '/t[ae]+st/{print $0}'

plus sign with character classes

if any character from the character class exists, it succeeds.

Curly Braces

Curly braces enable you to specify the number of existence for a pattern, it has two formats:

n: The regex appears exactly n times.

n,m: The regex appears at least n times, but no more than m times.

echo "tst" | awk '/te{1}st/{print $0}'

echo "test" | awk '/te{1}st/{print $0}'

curly braces

In old versions of awk, you should use –re-interval option for the awk command to make it read curly braces, but in newer versions you don’t need it.

echo "tst" | awk '/te{1,2}st/{print $0}'

echo "test" | awk '/te{1,2}st/{print $0}'

echo "teest" | awk '/te{1,2}st/{print $0}'

echo "teeest" | awk '/te{1,2}st/{print $0}'

curly braces interval pattern

In this example, if the “e” character exists one or two times, it succeeds; otherwise, it fails.

You can use it with character classes like this:

echo "tst" | awk '/t[ae]{1,2}st/{print $0}'

echo "test" | awk '/t[ae]{1,2}st/{print $0}'

echo "teest" | awk '/t[ae]{1,2}st/{print $0}'

echo "teeast" | awk '/t[ae]{1,2}st/{print $0}'

interval pattern with character classes

If there are one or two instances of the letter “a” or “e” the pattern passes, otherwise, it fails.

Pipe Symbol

The pipe symbol makes a logical OR between 2 patterns. If one of the patterns exists, it succeeds, otherwise, it fails, here is an example:

echo "Testing regex" | awk '/regex|regular expressions/{print $0}'

echo "Testing regular expressions" | awk '/regex|regular expressions/{print $0}'

echo "This is something else" | awk '/regex|regular expressions/{print $0}'

pipe symbol

Don’t type any spaces between the pattern and the pipe symbol.

Grouping Expressions

You can group expressions so the regex engines will consider them one piece.

echo "Like" | awk '/Like(Geeks)?/{print $0}'

echo "LikeGeeks" | awk '/Like(Geeks)?/{print $0}'

grouping expressions

The grouping of the “Geeks” makes the regex engine treats it as one piece, so if “LikeGeeks” or the word “Like” exist, it succeeds.

Practical examples

We saw some simple demonstrations of using regular expression patterns, it’s time to put that in action, just for practicing.

Counting Directory Files

Let’s look at a bash script that counts the executable files in a folder from the PATH environment variable.

echo $PATH

To get a directory listing, you must replace each colon with space.

echo $PATH | sed 's/:/ /g'

Now let’s iterate through each directory using the for loop like this:

mypath=$(echo $PATH | sed 's/:/ /g')

for directory in $mypath; do

done

Great!!

You can get the files on each directory using the ls command and save it in a variable.

You may notice some directories doesn’t exist, no problem with this its OK.

count files

Cool!! This is the power of regex. These few lines of code count all files in all directories. Of course, there is a Linux command to do that very easy, but here we discuss how to employ regex on something you can use. You can come up with some more useful ideas.

Validating E-mail Address

There are a ton of websites that offer ready to use regex patterns for everything including e-mail, phone number, and much more, this is handy but we want to understand how it works.

username@hostname.com

The username can use any alphanumeric characters combined with dot, dash, plus sign, underscore.

The hostname can use any alphanumeric characters combined with a dot and underscore.

For the username, the following pattern fits all usernames:

^([a-zA-Z0-9_\-\.\+]+)@

The plus sign means one character or more must exist followed by the @ sign.

Then the hostname pattern should be like this:

([a-zA-Z0-9_\-\.]+)

There are special rules for the TLDs or Top-level domains, and they must be not less than 2 and five characters maximum. The following is the regex pattern for the top-level domain.

\.([a-zA-Z]{2,5})$

Now we put them all together:

^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Let’s test that regex against an email:

echo "name@host.com" | awk '/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/{print $0}'

echo "name@host.com.us" | awk '/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/{print $0}'

validate email

Awesome!! Works great.

This was just the beginning of regex world that never ends. I hope after this post you understand these ASCII pukes 🙂 and use it more professionally.

I hope you like the post.

Thank you.

0

Linux Bash Scripting Part5 – Signals and Jobs

In the previous post, we talked about input, output, and redirection in bash scripts. Today we will learn how to run and control them on a Linux system. Till now, we can run scripts only from the command line interface. This isn’t the only way to run Linux bash scripts. This post describes the different ways to control your Linux bash scripts. In shell scripts, we talked about important things called Input, Output and Redirection. Everything is a file in Linux and that includes input and output. So we need to understand each one in detail.

 

Continue Reading →

Your Linux bash scripts don’t control these signals, you can program your bash script to recognize signals and perform commands based on the signal that was sent.

Stop a Process

To stop a running process, you can press Ctrl+C which generates SIGINT signal to stop the current process running in the shell.

sleep 100

Ctrl+C

Linux bash scripting Signals and Jobs stop process

Pause a Process

The Ctrl+Z keys generate a SIGTSTP signal to stop any processes running in the shell, and that leaves the program in memory.

sleep 100

Ctrl+Z

pause process

The number between brackets which is (1) is the job number.

If try to exit the shell and you have a stopped job assigned to your shell, the bash warns you if you.

The ps command is used to view the stopped jobs.

ps –l

ps -l

In the S column (process state), it shows the traced (T) or stopped (S) states.

If you want to terminate a stopped job you can kill its process by using kill command.

kill processID

Trap Signals

To trap signals, you can use the trap command. If the script gets a signal defined by the trap command, it stops processing and instead the script handles the signal.

You can trap signals using the trap command like this:

#!/bin/bash

trap "echo 'Ctrl-C was trapped'" SIGINT

total=1

while [ $total -le 3 ]; do

echo "#$total"

sleep 2

total=$(($total + 1))

done

Every time you press Ctrl+C, the signal is trapped and the message is printed.

trap signal

If you press Ctrl+C, the echo statement specified in the trap command is printed instead of stopping the script. Cool, right?

Trapping The Script Exit

You can trap the shell script exit using the trap command like this:

#!/bin/bash

# Add the EXIT signal to trap it

trap "echo Goodbye..." EXIT

total=1

while [ $total -le 3 ]; do

echo "#$total"

sleep 2

total=$(($total + 1))

done

trap exit

When the bash script exits, the Goodbye message is printed as expected.

Also, if you exit the script before finishing its work, the EXIT trap will be fired.

Modifying Or Removing a Trap

You can reissue the trap command with new options like this:

#!/bin/bash

trap "echo 'Ctrl-C is trapped.'" SIGINT

total=1

while [ $total -le 3 ]; do

echo "Loop #$total"

sleep 2

total=$(($total + 1))

done

# Trap the SIGINT

trap "echo ' The trap changed'" SIGINT

total=1

while [ $total -le 3 ]; do

echo "Second Loop #$total"

sleep 1

total=$(($total + 1))

done

modify trap

Notice how the script manages the signal after changing the signal trap.

You can also remove a trap by using 2 dashes trap -- SIGNAL

#!/bin/bash

trap "echo 'Ctrl-C is trapped.'" SIGINT

total=1

while [ $total -le 3 ]; do

echo "#$total"

sleep 1

total=$(($total + 1))

done

trap -- SIGINT

echo "I just removed the trap"

total=1

while [ $total -le 3 ]; do

echo "Loop #2 #$total"

sleep 2

total=$(($total + 1))

done

Notice how the script processes the signal before removing the trap and after removing the trap.

./myscript

Crtl+C

remove trap

The first Ctrl+C was trapped and the script continues running while the second one exits the script because the trap was removed.

Running Linux Bash Scripts in Background Mode

If you see the output of the ps command, you will see all the running processes in the background and not tied to the terminal.

We can do the same, just place ampersand symbol (&) after the command.

#!/bin/bash

total=1

while [ $total -le 3 ]; do

sleep 2

total=$(($total + 1))

done

./myscipt &

run in background

Once you’ve done that, the script runs in a separate background process on the system and you can see the process id between the square brackets.

When the script dies,  you will see a message on the terminal.

Notice that while the background process is running, you can use your terminal monitor for STDOUT and STDERR messages so if an error occurs, you will see the error message and normal output.

run script in background

The background process will exit if you exit your terminal session.

So what if you want to continue running even if you close the terminal?

Running Scripts without a Hang-Up

You can run your Linux bash scripts in the background process even if you exit the terminal session using the nohup command.

The nohup command blocks any SIGHUP signals. This blocks the process from exiting when you exit your terminal.

nohup ./myscript &

linux bash nohup command

After running the nohup command, you can’t see any output or error from your script. The output and error messages are sent to a file called nohup.out.

Note: when running multiple commands from the same directory will override the nohup.out file content.

Viewing Jobs

To view the current jobs, you can use the jobs command.

#!/bin/bash

total=1

while [ $total -le 3 ]; do

echo "#$count"

sleep 5

total=$(($total + 1))

done

Then run it.

./myscript

Then press Ctrl+Z to stop the script.

linux bash view jobs

Run the same bash script but in the background using the ampersand symbol and redirect the output to a file just for clarification.

./myscript > outfile &

linux bash list jobs

The jobs command shows the stopped and the running jobs.

jobs –l

-l parameter to view the process ID

 Restarting Stopped Jobs

The bg command is used to restart a job in background mode.

./myscript

Then press Ctrl+Z

Now it is stopped.

bg

linux bash restart job

After using bg command, it is now running in background mode.

If you have multiple stopped jobs, you can do the same by specifying the job number to the bg command.

The fg command is used to restart a job in foreground mode.

fg 1

Scheduling a Job

The Linux system provides 2 ways to run a bash script at a predefined time:

  • at command.
  • cron table.

The at command

This is the format of the command

at [-f filename] time

The at command can accept different time formats:

  • Standard time format like 10:15.
  • An AM/PM indicator like 11:15PM.
  • A specifically named time like now, midnight.

You can include a specific date, using some different date formats:

  • A standard date format, such as MMDDYY or DD.MM.YY.
  • A text date, such as June 10 or Feb 12, with or without the year.
  • Now + 25 minutes.
  • 05:15AM tomorrow.
  • 11:15 + 7 days.

We don’t want to dig deep into the at command, but for now, just make it simple.

at -f ./myscript now

linux bash at command

The -M parameter is used to send the output to email if the system has email, and if not, this will suppress the output of the at command.

To list the pending jobs, use atq command:

linux bash at queue

Remove Pending Jobs

To remove a pending job, use the atrm command:

atrm 18

delete at queue

You must specify the job number to the atrm command.

Scheduling Scripts

What if you need to run a script at the same time every day or every month or so?

You can use the crontab command to schedule jobs.

To list the scheduled jobs, use the -l parameter:

crontab –l

The format for crontab is:

minute,Hour, dayofmonth, month, and dayofweek

So if you want to run a command daily at 10:30, type the following:

30 10 * * * command

The wildcard character (*) used to indicate that the cron will execute the command daily on every month at 10:30.

To run a command at 5:30 PM every Tuesday, you would use the following:

30 17 * * 2 command

The day of the week starts from 0 to 6 where Sunday=0 and Saturday=6.

To run a command at 10:00 on the beginning of every month:

00 10 1 * * command

The day of the month is from 1 to 31.

Let’s keep it simple for now and we will discuss the cron in great detail in future posts.

To edit the cron table, use the -e parameter like this:

crontab –e

Then type your command like the following:

30 10 * * * /home/likegeeks/Desktop/myscript

This will schedule our script to run at 10:30 every day.

Note: sometimes you see error says Resource temporarily unavailable.

All you have to do is this:

rm -f /var/run/crond.pid

You should be a root user to do this.

Just that simple!

You can use one of the pre-configured cron script directories like:

/etc/cron.hourly

/etc/cron.daily

/etc/cron.weekly

/etc/cron.monthly

Just put your bash script file on any of these directories and it will run periodically.

Starting Scripts at Login

In the previous posts, we’ve talked about startup files, I recommend you to review the previous.

$HOME/.bash_profile

$HOME/.bash_login

$HOME/.profile

To run your scripts at login, place your code in $HOME/.bash_profile.

Starting Scripts When Opening the Shell

OK, what about running our bash script when the shell opens? Easy.

Type your script on .bashrc file.

And now if you open the shell window, it will execute that command.

I hope you find the post useful. keep coming back.

Thank you.

0