As previously mentioned, you can make matching case insensitive with the i flag:
/\b[Uu][Nn][Ii][Xx]\b/; # explicitly giving case folding /\bunix\b/i; # using ''i'' flag to fold code
As mentioned before, usually the "." (dot, period, full stop) matches any character except newline. You make it match newline with the s flag:
/"(.|\n)*"/; # match any quoted string, even with newl /"(.*)"/s; # same meaning, using ''s'' flag
N.B. - I like to use the flags ///six; as a personal default set of flags with Perl regular expressions.
You can make your matching global with the g flag. For ordinary matches, this means making them stateful: Perl will remember where you left off with each reinvocation of the match unless you change the value of the variable, which will reset the match.
#!/usr/bin/perl -w # shows the //g as stateful... while(<>) { while(/[A-Z]{2,}/g) { print "$&\n" if (defined($&)); } }
You can even specify a variable inside of a pattern - but you want to make sure that it gives a legitimate regular expression.
my $var1 = "[A-Z]*"; if( "AB" =~ /$var1/ ) { print "$&"; } else { print "nopers"; } # yields AB
#!/usr/bin/perl -w # shows s///g... by removing acronyms use strict; while(<>) { s/([A-Z]{2,})//g; print; }
s/\bfigure (\d+)/Figure $1/ # capitalize references to figures s{//(.*)}{/\*$1\*/} # use old style C comments s!\bif(!if (! # put a blank s(!)(.) # tone down that message s[!][.]g # replace all occurrences of '!' with '.'
You can use \U and \L to change follows them to upper and lower case:
$text = " the acm and the ieee are the best! "; $text =~ s/acm|ieee/\U$&/g; print "$text\n"; # yields the ACM and the IEEE are the best!
$text = "CDA 1001 and COP 3101 are good classes, but CIS 4385 is better!"; $text =~ s/\b(COP|CDA|CIS) \d+/\L$&/g; print "$text\n"; # yields cda 1001 and cop 3101 are good classes, but CIS 4385 is better!
$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case $cnt = tr/*/*/; # count the stars in $_ $cnt = $sky =~ tr/*/*/; # count the stars in $sky $cnt = tr/0-9//; # count the digits in $_
# get rid of redundant blanks in $_ tr/ //s; # replace [ and { with ( in $text $text =~ tr/[{/(/;
The split function breaks up a string according to a specified separator pattern and generates a list of the substrings.
For example:
$line = " This sentence contains five words. "; @fields = split / /, $line; $count = 0; map { print "$count --> $fields[$count]\n"; $count++; } @fields; # yields --> 1 --> This 2 --> sentence 3 --> contains 4 --> five 5 --> words.
The join function does the reverse of the split function: it takes a
list and converts to a string.
However, it is different in that it doesn't take a pattern as its first
argument, it just takes a string:
@fields = qw/ apples pears cantaloupes cherries /; $line = join "<-->", @fields; print "$line\n"; # yields apples<-->pears<-->cantaloupes<-->cherries
[Also see man perlfaq5 for more detail on this subject.]
Unlike other variables, you don't declare filehandles. The convention is to use all uppercase letters for filehandle names. (Especially important if you deal with anonymous filehandles!) The open operator takes two arguments, a filehandle name and a connection (e.g. filename).
The close operator closes a filehandle. This causes any remaining output data associated with this filehandle to be flushed to the file. Perl automatically closes filehandles at the end of a process, or if you reopen it.
close IN; # closes the IN filehandle close OUT; # closes the OUT filehandle close LOG; # closes the LOG filehandle
You can check the status of opening a file by examining the result of the open operation. It returns a true value if it succeeded, and a false one if it failed.
You can reopen a standard filename. This allows you to perform input or output in a normal fashion, but to redirect the I/O from/to a file within the Perl program.
Like BASH, file tests exist in Perl (source: man perlfunc):
-r File is readable by effective uid/gid. -w File is writable by effective uid/gid. -x File is executable by effective uid/gid. -o File is owned by effective uid. -R File is readable by real uid/gid. -W File is writable by real uid/gid. -X File is executable by real uid/gid. -O File is owned by real uid.
-e File exists. -z File has zero size (is empty). -s File has nonzero size (returns size in bytes). -f File is a plain file. -d File is a directory. -l File is a symbolic link. -p File is a (named) pipe (FIFO)
-S File is a socket. -b File is a block special file. -c File is a character special file. -t Filehandle is opened to a tty. -u File has setuid bit set. -g File has setgid bit set. -k File has sticky bit set.
-T File is an ASCII text file (heuristic guess). -B File is a "binary" file (opposite of -T). -M Script start time minus file modification time, in days. -A Same for access time. -C Same for inode change time (Unix, may differ for other platforms).
You can use file status like this, for instance, as pre-test:
while (<>) { chomp; next unless -f $_; # ignore non-files #... }
Or you can use them as a post-test:
if(! open(FH, $fn)) { if(! -e "$fn") { die "File $fn doesn't exist."; } if(! -r "$fn") { die "File $fn isn't readable."; } if(-d "$fn") { die "$fn is a directory, not a regular file."; } die "$fn could not be opened."; }
You can declare subroutines in Perl with sub, and call them with the "&" syntax:
my @list = qw( /etc/hosts /etc/resolv.conf /etc/init.d ); map ( &filecheck , @list) ; sub filecheck { if(-f "$_") { print "$_ is a regular file\n"; } else { print "$_ is not a regular file\n"; } }
#!/usr/bin/perl -w # shows subroutine argument lists use strict; my $val = max(10,20,30,40,11,99); print "max = $val\n"; sub max { print "Using $_[0] as first value...\n"; my $memory = shift(@_); foreach(@_) { if($_ > $memory) { $memory = $_; } } return $memory; }
You can locally define variables for a subroutine with my:
sub func { my $ct = @_; ...; }
The variable $ct is defined only within the subroutine func.
The built-ins functions sort() and map() can accept a subroutine rather than just an anonymous block:
@list = qw/ 1 100 11 10 /; @default = sort(@list); @mysort = sort {&mysort} @list; print "default sort: @default\n"; print "mysort: @mysort\n"; sub mysort { return $a <=> $b; } # yields default sort: 1 10 100 11 mysort: 1 10 11 100
As you can see, sort() sends along two special, predefined variables, $a and $b.
As discussed earlier, <=> returns a result of -1,0,1 if the left hand value is respectively numerically less than, equal to, or greater than the right hand value.
cmp returns the same, but uses lexical rather numerical ordering.
A very similar operator is grep, which only returns a list of the items that matched an expression (sort and map should always return a list exactly as long as the input list.)
For example:
@out = grep {$_ % 2} qw/1 2 3 4 5 6 7 8 9 10/; print "@out\n"; # yields 1 3 5 7 9
Notice that the block item should return 0 for non-matching items.
chdir $DIRNAME; # change directory to $DIRNAME glob $PATTERN; # return a list of matching patterns # example: @list = glob "*.pl"; print "@list \n"; Script16.pl Script18.pl Script19.pl Script20.pl Script21.pl [...]
unlink $FN1, $FN2, ...; # remove a hard or soft link to files rename $FN1, $FN2; # rename $FN1 to new name $FN2 mkdir $DN1; # create directory with umask default permi rmdir $DN1, $DN2, ...; # remove directories chmod perms, $FDN1; # change permissions
You can pull in the contents of a directory with opendir and readdir:
opendir(DH,"/tmp"); @filenams = readdir(DH); closedir(DH); print "@filenams\n"; # yields .s.PGSQL.5432.lock .. mapping-root ssh-WCWcZf4199 xses-langley.joHONt
@lines = 'head -10 /etc/hosts'; print "@lines\n";