awkuse

Embed Size (px)

Citation preview

  • 8/6/2019 awkuse

    1/4

    AWK syntax:awk [-Fs] "program" [file1 file2...] # commands come from DOS cmdlineawk 'program{print "foo"}' file1 # single quotes around double quotes# NB: Don't use single quotes alone if the embedded info will contain the# vertical bar or redirection arrows! Either use double quotes, or (if# using 4DOS) use backticks around the single quotes: `'NF>1'`

    # NB: since awk will accept single quotes around arguments from the# DOS command line, this means that DOS filenames which contain a# single quote cannot be found by awk, even though they are legal names# under MS-DOS. To get awk to find a file named foo'bar, the name must# be entered as foo"'"bar.

    awk [-Fs] -f pgmfile [file1 file2...] # commands come from DOS file

    If file1 is omitted, input comes from stdin (console).Option -Fz sets the field separator FS to letter "z".

    AWK notes:

    "pattern {action}"if {action} is omitted, {print $0} is assumedif "pattern" is omitted, each line is selected for {action}.

    Fields are separated by 1 or more spaces or tabs: "field1 field2"If the commands come from a file, the quotes below can be omitted.

    Basic AWK commands:-------------------"NR == 5" file show rec. no. (line) 5. NB: "==" is equals.{FOO = 5} single = assigns "5" to the variable FOO"$2 == 0 {print $1}" if 2d field is 0, print 1st field"$3 < 10" if 3d field < 10, numeric comparison; print line

    '$3 < "10" ' use single quotes for string comparison!, or-f pgmfile [$3 < "10"] use "-f pgmfile" for string comparison"$3 ~ /regexp/" if /regexp/ matches 3d field, print the line'$3 ~ "regexp" ' regexp can appear in double-quoted string*

    # * If double-quoted, 2 backslashes for every 1 in regexps# * Double-quoted strings require the match (~) character.

    "NF > 4" print all lines with 5 or more fields"$NF > 4" print lines where the last field is 5 or more"{print NF}" tell us how many fields (words) are on each line"{print $NF}" print last field of each line

    "/regexp/" Only print lines containing "regexp"

    "/textfile/" Lines containing "text" or "file" (CASE SENSITIVE!)

    "/foo/{print "za", NR}" FAILS on DOS/4DOS command line!!'/foo/{print "za", NR}' WORKS on DOS/4DOS command line!!

    If lines matches "foo", print word and line no.`"/foo/{print \"za\",NR}"` WORKS on 4DOS cmd line: escape internal quotes with

    slash and backticks; for historical interest only.

    "$3 ~ /B/ {print $2,$3}" If 3d field contains "B", print 2d + 3d fields"$4 !~ /R/" Print lines where 4th field does NOT contain "R"

    '$1=$1' Del extra white space between fields & blank lines'{$1=$1;print}' Del extra white space between fields, keep blanks

    'NF' Del all blank lines

    AND(&&), OR(), NOT(!)

  • 8/6/2019 awkuse

    2/4

    -----------------------"$2 >= 4 $3 5 && /with/" lines containing "with" for lines 6 or beyond"/x/ && NF > 2" lines containing "x" with more than 2 fields

    "$3/$2 != 5" not equal to "value" or "string""$3 !~ /regexp/" regexp does not match in 3d field

    "!($3 == 2 && $1 ~ /foo/)" print lines that do NOT match condition

    "{print NF, $1, $NF}" print no. of fields, 1st field, last field"{print NR, $0}" prefix a line number to each line'{print NR ": " $0}' prefix a line number, colon, space to each line

    "NR == 10, NR == 20" print records (lines) 10 - 20, inclusive"/start/, /stop/" print lines between "start" and "stop"

    "length($0) > 72" print all lines longer than 72 chars"{print $2, $1}" invert first 2 fields, delete all others"{print substr($0,index($0,$3))}" print field #3 to end of the line

    END{...} usage--------------- END reads all input first.

    1) END { print NR } # same output as "wc -l"

    2) {s = s + $1 } # print sum, ave. of all figures in col. 1END {print "sum is", s, "average is", s/NR}

    3) {names=names $1 " " } # converts all fields in col 1 toEND { print names } # concatenated fields in 1 line, e.g.

    +---Beth 4.00 0 #

    input Mary 3.75 0 # infile is converted to:file Kathy 4.00 10 # "Beth Mary Kathy Mark" on output

    +---Mark 5.00 30 #

    4) { field = $NF } # print the last field of the last lineEND { print field }

    PRINT, PRINTF: print expressions, print formattedprint expr1, expr2, ..., exprn # parens() needed if the expression containsprint(expr1, expr2, ..., exprn) # any relational operator: =

    print # an abbreviation for {print $0}print "" # print only a blank lineprintf(expr1,expr2,expr3,\n} # add newline to printf statements

    FORMAT CONVERSION:------------------BEGIN{ RS=""; FS="\n"; # takes records sep. by blank lines, fields

    ORS="\n"; OFS="," } # sep. by newlines, and converts to records{$1=$1; print } # sep. by newlines, fields sep. by commas.

    PARAGRAPHS:-----------'BEGIN{RS="";ORS="\n\n"};/foo/' # print paragraph if 'foo' is there.

    'BEGIN{RS="";ORS="\n\n"};/foo/&&/bar/' # need both;/foobar/' # need either

  • 8/6/2019 awkuse

    3/4

    PASSING VARIABLES:------------------gawk -v var="/regexp/" 'var{print "Here it is"}' # var is a regexpgawk -v var="regexp" '$0~var{print "Here it is"}' # var is a quoted stringgawk -v num=50 '$5 == num' # var is a numeric value

    Built-in variables:

    ARGC number of command-line argumentsARGV array of command-line arguments (ARGV[0...ARVC-1])FILENAME name of current input fileFNR input record number in current fileFS input field separator (default blank)NF number of fields in current input recordNR input record number since beginningOFMT output format for numbers (default "%.6g")OFS output field separator (default blank)ORS output record separator (default newline)RLENGTH length of string matched by regular expression in matchRS input record separator (default newline)

    RSTART beginning position of string matched by matchSUBSEP separator for array subscripts of form [i,j,...] (default ^\)

    Escape sequences:\b backspace (^H)\f formfeed (^L)\n newline (DOS, CR/LF; Unix, LF)\r carriage return\t tab (^I)\ddd octal value `ddd', where `ddd' is 1-3 digits, from 0 to 7\c any other character is a literal, eg, \" for " and \\ for \

    Awk string functions:

    `r' is a regexp, `s' and `t' are strings, `i' and `n' are integers`&' in replacement string in SUB or GSUB is replaced by the matched string

    gsub(r,s,t) globally replace regex r with string s, applied to data t;return no. of substitutions; if t is omitted, $0 is used.

    gensub(r,s,h,t) replace regex r with string s, on match number h, appliedto data t; if h is 'g', do globally; if t is omitted, $0 isused. Return the converted pattern, not the no. of changes.

    index(s,t) return the index of t in s, or 0 if s does not contain tlength(s) return the length of smatch(s,r) return index of where s matches r, or 0 if there is no

    match; set RSTART and RLENGTHsplit(s,a,fs) split s into array a on fs, return no. of fields; if fs is

    omitted, FS is used in its placesprintf(fmt,expr-list) return expr-list formatted according to fmtsub(r,s,t) like gsub but only the first matched substring is replacedsubstr(s,i,n) return the n-character substring of s starting at i; if n

    is omitted, return the suffix of s starting at i

    Arithmetic functions:atan2(y,x) arctangent of y/x in radians in the range of - to cos(x) cosine (angle in radians)exp(n) exponential e (n need not be an integer)int(x) truncate to integerlog(x) natural logarithm

    rand() pseudo-random number r, 0 r 1sin(x) sine (angle in radians)sqrt(x) square root

  • 8/6/2019 awkuse

    4/4

    srand(x) set new seed for random number generator; uses time of dayif no x given

    [end-of-file]