Bash

grep, sed, awk, and Functions

grep

  • grep is a standard *NIX tool to search files for patterns and print matching lines
  • There are numerous options for both how matching is done, and what the output of grep is
  • Options that effect matching
    • -P: Use a Perl style regular expression
    • -e PATTERN: Specify a pattern after each -e (allows multiple patterns)
    • -f FILE: Read patterns in from FILE, one per line
    • -i: Ignore case
    • -v: Invert matches, only print lines that do not match the pattern
In [ ]:
grep  "todo"  /usr/src/linux-headers-4.13.0-25/include/soc/tegra/bpmp-abi.h
In [ ]:
grep -R "todo"  /usr/src/linux-headers-4.13.0-25/include
In [ ]:
grep -Ri "todo"  /usr/src/linux-headers-4.13.0-25/include
In [ ]:
grep -P "^import" /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
more src/shell/read_ps_example.sh
In [ ]:
grep -vP "^#[^\!]" src/shell/read_ps_example.sh

grep Output Options

  • -o: Only output the part of the line that matches
  • -color=COLOR_OPTION: suppress or enforce highlighting of matches in line
  • -l: Print only the file names where a match has been found
  • -L: Print only the file names where no match has been found
  • -h: Don't print the file name for each match
  • -n: Print the line number the match was found on
  • -c: Print the number of matches found in each file
In [ ]:
grep -Ri "todo"  /usr/src/linux-headers-4.13.0-25/include
In [ ]:
grep -Ri --color=always "todo"  /usr/src/linux-headers-4.13.0-25/include | head
In [ ]:
grep -Rin --color=always "todo"  /usr/src/linux-headers-4.13.0-25/include | head
In [ ]:
grep -Ril "todo"  /usr/src/linux-headers-4.13.0-25/include | head
In [ ]:
grep -Pc "^import" /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
grep -Ph "^import" /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py | head

Printing More Context per Match

  • It is often useful, especially in debugging, to print some lines around each match
  • grep has three flags that control this, each taking a numerical argument
    • -A NUM: Print the NUM lines after each match
    • -B NUM: Print the NUM lines before each match
    • -C NUM: Print the NUM lines before and after each match
In [ ]:
grep -A4 "todo"  /usr/src/linux-headers-4.13.0-25/include/soc/tegra/bpmp-abi.h
In [ ]:
grep -m2 -P --color=never "huge\thuge\tADJ\t" ~/Research/Data/UD_English-EWT/en-ud-dev.conllu
In [ ]:
grep -m2 -P -C5 --color=never "huge\thuge\tADJ\t" ~/Research/Data/UD_English-EWT/en-ud-dev.conllu

Grep Practice

  • Use grep (or grep piped to another grep, etc.) to return a list of all scripting languages used by files in a directory
    • Look for the shebang line and go from there

sed and awk

  • sed and awk are both extremely popular tools for text manipulation from the command line
    • sed is short for Stream EDitor
    • awk is named after the last names of the 3 creators
  • Both sed and awk are full-fledges scripting languges in their own right

awk Scripts

  • An awk script is a series of rules of the format
    pattern { action }
    
  • The awk script is running either directly as an argument to the awk command, or from a file specified to awk
    awk 'SCRIPT' INPUT_FILE
    awk -f SCRIPT_FILE INPUT_FILE
    

awk Scripts Continued

  • Two special patterns exist in awk
    • BEGIN will match before any input is read
    • END will match after all input is read
  • The most common action is print, often seen as
    { print $0; }
    
In [ ]:
awk '/^import/ { print $0; }' /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk '/^import/' /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk '{ print $0; }' /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py

awk Variables

  • awk was designed for space seperated data files
    • The delimiter can be changes by using the -F flag
  • The entire line is in the variable $0
    • Each field is placed in $1, $2, and so on

Special awk Variables

  • awk has many built in variables that contain useful information
  • Two commons ones are:
    • NF - The number of fields in the current record (on the current line)
    • NR - The number of records processed so far
  • All variables can be used in both patterns and actions
In [ ]:
awk 'NF > 12 { print $0; }'  /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk 'length($0) > 80 { print $0; }'  /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk 'END { print NR; }' /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk '/^import/ {print $2;}'  /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py
In [ ]:
awk '/^import/ {print $2;}'  /usr/local/lib/python3.6/dist-packages/jupyter_core/*.py | sort -u

Awk Practice

  • Use awk to return a list of all scripting languages used by files in a directory
    • Look for the shebang line and go from there
    • Awk variables can be derived from other variables, ie $\$$(NR)
In [ ]:
awk -F/ '/^#!/ {print $(NF);}' src/**/*

sed

  • By default, sed reads in a file, applies a command to each line in a file, and prints the result to STDOUT
    • The edits can be changed to occur in-place by using the -i flag. Be careful with this!
    • To only print lines explicitly using a command, use the -n flag

Selecting Lines

  • Similarly to awk, sed allows us to control which lines we apply the commands to
  • The lines can be selected by using
    • The line number
    • A range of line numbers
    • A regex
    • The inverse of any of the above
In [ ]:
paste <(head data/words.txt) <(tail data/words.txt)
In [ ]:
wc -l data/words.txt
In [ ]:
sed -n '21p' data/words.txt
In [ ]:
sed -n '21,25p' data/words.txt
In [ ]:
sed -n '/[A-Z][a-z]/p' data/words.txt
In [ ]:
sed -n '/[[:digit:]][[:digit:]]/p' data/words.txt
In [ ]:
sed -n '1,170!p' data/words.txt
In [ ]:
sed -n '/^[a-zA-Z]/!p' data/words.txt

sed Commands

  • There are many different commands, but the most common ones are
    • d: Delete a line
    • s/REGEX/SUBSTITUTION/: performs a substitution
    • a: Appends after a line
    • i: Inserts before a line
    • p: Prints a line
In [ ]:
sed '/^\s*#[^!]/d' src/shell/read_ps_example.sh
In [ ]:
sed '5,$d' src/shell/read_ps_example.sh
In [ ]:
sed '/^\s*#!/a # There is a shebang line in this file' src/shell/read_ps_example.sh
In [ ]:
sed '$a #This is the end of the file' src/shell/read_ps_example.sh
In [ ]:
sed '$i \\n#Only One Line Left' src/shell/read_ps_example.sh
In [ ]:
sed 's/u/i/' src/shell/read_ps_example.sh
In [ ]:
sed 's/u/i/g' src/shell/read_ps_example.sh
In [ ]:
sed '/^#[^!]/s/u/i/g' src/shell/read_ps_example.sh
In [ ]:
ls -lh data/* | sed -r '/[[:digit:]]+[KM]\s/s/.*[^[:alpha:].]([[:alpha:].]+)$/\1/g'

Sed Practice

  • Use sed to to replace all print statements of the old python form with the new one, on uncommented lines only
    • print X should be replaced to print(X)

Functions

  • Functions in bash behave like functions in most other languages, with a few notable differences
    • No keyword used to define the function
    • No return type
    • No parameter list in definition
    • Called like a bash command rather than a function

Function Syntax

  • To define a function:
    FUNCTION_NAME()
    {
      #CODE HERE
    }
    
  • To call a function
    FUNCTION_NAME ARG1 ARG2 ARG3
    
In [ ]:
hello(){
    echo "Hello World";
}

hello
In [ ]:
today() {
    date +"%A, %B %-d, %Y"
}

echo -n "Today is "
today
In [ ]:
ls(){
 command ls -lh
}
#ls

Accessing Function Arguments

  • Inside of a function definition, the special variables $1, $2, etc refer to the arguments passed to the function
    • This means there is no way to access the arguments passed to the script as a whole using the same variables, they must have been saved prior to defining the function
    • Just like with a script, the special variable $# holds the number of arguments passed
In [ ]:
hello(){
    echo "Hello $1!"
}
hello World
hello Class
hello
hello The Whole World
In [ ]:
print_all_args(){
    echo $@
    echo $#
}
print_all_args How Many Args Are Printed?
In [ ]:
count_files_of_type() {
    echo $(/bin/ls -1 *.$1 | wc -l)
}
count_files_of_type sh
count_files_of_type ipynb
count_files_of_type pl
count_files_of_type

do_math()
{
    a=$1
    echo $((a + 1))
}
do_math 10

Returning Values

  • bash does support the return keyword, but this can only return integer values
    • It is an exit code, similar to what a script would produce
    • To access it, look in the $? variable
  • The other option is to capture the output using command substitution
    $(FUNCTION_NAME ARG1 ARG2)
    
In [ ]:
sum(){
    sum=0
    for num in $@; do
        sum=$((sum + num))
    done
    return $sum
}

sum 1 2 3 4 5
echo "The sum is $?"
In [ ]:
sum(){
    sum=0
    for num in $@; do
        sum=$((sum + num))
    done
    echo $sum
}

total=$(sum 1 2 3 4 5)
echo $((total + 1))
echo "The sum is $total"
In [ ]:
count_files_of_type2() {
    /bin/ls -1 *.$1 | wc -l
}
echo "There are $(count_files_of_type2 sh) .sh files"

Scope in bash Functions

  • bash doesn't have a default concept of variable scope except for the special variables $1, $2, etc.
    • Everything is global by default
  • The keyword local before the assignment operator will ensure that the scope of the variable is only the function itself
In [ ]:
x=1
mess_up_the_scope(){
    x=2
}
echo "The value of x is $x"
mess_up_the_scope
echo "The value of x is $x"
In [ ]:
x=1
cant_mess_up_the_scope(){
    local x=2
}
echo "The value of x is $x"
cant_mess_up_the_scope
echo "The value of x is $x"

Functions Practice

  • Write a function that takes in two parameters, a directory, and a language, and finds all files in that directory written in that language
    • Use the shebang line to determine the language