Learn Perl

Ch2 标量数据

第一个Helloword 程序

1 2	#! /usr/bin/perl print "Hello, world!\n";

perl是一种脚本语言，一般已经被内置在类unix系统中。类似的还有shell，bash,csh,tcl,python等。

第一行的#!读作shell bang。shell bang 是一个用于指定脚本解释器的特殊字符组合。它由 #（井号）和 !（感叹号）组合而成，通常位于脚本的第一行。当一个脚本被执行时，系统会参考 shebang 指定的解释器来执行这个脚本。

例如，如果一个脚本的第一行是 #!/bin/bash，那么这个脚本就会使用 Bash 作为其解释器。如果改为 #!/usr/bin/python3，那么这个脚本就会使用 Python 3 作为其解释器。

linux默认的脚本解释器总是在/usr/bin目录下，如果解释器安装在其他目录下，则需要修改对应的路径。

由于有shell bang的存在，perl程序不需要有特定的扩展名，而只需要是可执行的。（为了区分可以加上.pl）的后缀。

此外也可以直接通过perl命令执行脚本。这2种方式大部分情况下是相同的，然而有时候也会产生不同的效果。

目前本人还没搞清楚这种情况出现的原因。

反引号``可以用于执行外部命令:

#! /usr/bin/perl
 @lines = `ls -a`;
 foreach (@lines) {
  print;
}

标量数据

字符串，数字都被看做标量数据。

perl会根据数据的操作符判断执行代数运算还是字符串操作。

变量采用美元符号声明：

#! /usr/bin/perl
use v5.10;

$alef  = chr( 0x05D0 );
$alpha = chr( hex('03B1') );
$omega = chr( 0x03C9 );

say "alpha:".$alpha;
say "alef:".$alef;
say "omega:".$omega;

$code_point = ord( 'א' );

say $code_point;

chr函数会将对应的ascii字符进行转换。ord则相反。

使用x可以将字符串重复对应的次数。

say和print类似，区别是会自动换行。（需要5.10以上的版本）

使用.可以连接2个字符串。

其他特性

获取标准输入：<STDIN>

chomp的作用是去掉字符串末尾的换行符。

Ch3 列表和数组

列表(list)和数组(array)都用于表示多个变量。二者的区别在于，“列表”是数据的集合，而“数组”是存储列表的变量。即列表更倾向于数据，数组更倾向于变量。列表不一定放在数组里，而数组中一定保护列表。

数组array

数组和标量的声明方式完全相同。

#! /usr/bin/perl

use v5.10;
$word[0]= "s";
$word[1]= "j";
$word[2]= "t";
$word[3]= "uuu";

$word[18]="end";
$end = $#word;

say $end;
say $word;
say  $word[-19].$word[-18].$word[-17].$word[-16];
say @word

‘#’井号代表取数组的最后一个下标值，除了正着数，数组下标也可以用来倒数，如第三行所示。

第二行的输出为空，是因为$word被看作是一个标量，与数组不同，由于没有对这个标量赋值，所以输出为空。如果向输出列表的完整值，则应该使用@符号：

1	say @word;

使用超过原本数组长度的数组进行赋值会自动扩充数组，中间的元素被的值为undef

array

push,pop,shift,unshift

#! /usr/bin/perl
use v5.10;

@a=(1.7..5.7);
say @a;
($a,$b,$c) = qw(apple pear banana);
say $a.$b.$c;

# qw means quoted word
@mylist = qw/
s
j
t
u  a b c
/;

say pop @mylist;
say @mylist;


@list = 1..7;
push @list, 3;
say $list[-1];


say shift @list;
say @list;

unshift @list,0;
say @list[0];
say @list

第一种的..表示递增，但是只能以整数方式进行。

第二行的示例显示了一种将列表的值赋值给变量的方法。

qw的意思为”quoted words”可以用来省略字符串中的引号。使用qw后，空格或换号符都可以之间作为字符串的分隔。

push和pop类似于其他语法中的列表操作。需要注意的是对列表用$还是@效果都是一样的。

shift和unshift对列表头进行操作，相当于pop_front和push_front。

splice

#!/usr/bin/perl
use v5.10;
@school = qw/
S
J
T
U
G
G/;

@remove = splice @school,0,4;
say @remove;
say @school;


@remove = splice @school, 0,0,qw/D D/;
say @remove;
say @school;

splice最多可以接收4个参数，后面2个可以省略。其中第一个参数表示要操作的列表，第二个参数表示分割的起始索引位置，第三个参数表示分割的长度，第四个参数表示分割后插入的元素。插入的位置在被分割的起始位置。

如果第3个参数设置为0，则splice可以被当作插入函数在第二个参数指定的位置进行插入。

字符串数组内插

#!/usr/bin/perl
use v5.10;

@seasons= qw\spring summer autumn and winter.\;

say "I miss you in @seasons";

say "This is a reall hot $seasons[1]";

使用对应的索引或者完整的数组引用可以对字符串实现内插操作。如果希望使用原本的@符号或者[],则需要在前面加上反斜杠\进行转义。

foreach

#!/usr/bin/perl
use v5.10;


$letter = 's';

@alph =qw(b d p l m n z);
foreach $letter (@alph) {
    $letter .="a\n";
}

say "alph is now:",@alph;

say "letter is still: $letter\n";

foreach语句用于遍历列表元素，并且对应用来遍历的临时变量，不用担心其与已有的标量重名。

默认变量 $_

对于不指定变量的操作，统一使用默认变量$_:

#!/usr/bin/perl
use v5.10;

foreach (1..10){
    say "I can count to $_!";
}

$_ = "makabaka";
say;

reverse,sort,each

#!/usr/bin/perl
use v5.10;

@a=1..10;
@b = reverse @a;

say @a;
say @b;

say sort @b;

my @rocks =qw/bedrock slate rubble granite /;
while(my ($index,$value)=each @rocks ){
    say "$index:$value";
}

reverse用于将列表反转。
sort用于排序。排序规则是根据首字母的assicii值，并以此类推(首字母相同则比较下一个字母).

each可以创建索引和元素的键值对。

my 关键字用于声明局部变量，使得这些变量的作用域仅限于其声明的代码块内。在你的代码中，my 用来声明 $index 和 $value，以便在 while 循环的每次迭代中使用它们来存储当前的索引和值。

列表上下文与标量上下文

对于列表变量，由于其有多个属性，如里面的元素，元素的个数等，因此perl设计在引用列表变量时，不同的上下文环境中，列表变量的返回值不同。

#!/usr/bin/perl
use v5.10;


@people = qw/ fred barney betty /;

@sorted = sort @people;
$number = 42+@people;

say $number;
say @sorted;

$backward=reverse @people;
say $backward;

@people第一次调用时为列表上下文，返回列表中的所有元素进行排序。第二次由于与一个标量做加法，返回了列表的元素个数3。第三次调用reverse后复制给一个标量，因此返回了列表中所有元素合并而成的字符串。

列表上下文中使用标量表达式&强制标量上下文

#!/usr/bin/perl
use v5.10;

@fred = 6*7;
@barney = "hello"." "."world";
say @fred;
say @barney;

@wilma = undef;
say scalar @wilma;

@betty = ();
say scalar @betty;

前两行将一个标量复制给列表，实际上得到了只含有一个元素的列表。

saclar是一个伪函数，其作用是告诉解释器使用标量上下文。
此外，清空列表应该使用(),因为undef本身是一个标量，如果将其复制给列表，将得到只含有一个“undef”元素的列表，而非一个空列表；

列表上下文中的<STDIN>

#!/usr/bin/perl
use v5.10;


chomp(@list = <STDIN>);

say "Got list:@list";

将标准输入流赋值给列表变量，每次回车后输入新元素，使用ctrl+D来告诉程序输入结束。

习题

#!/usr/bin/perl
use v5.10;


chomp(@list = <STDIN>);

say reverse @list;

#! /usr/bin/perl
use v5.10;


@name = qw /fred betty barney dino wilma pebbles bamm-bamm /;

chomp(@numbers = <STDIN>);

foreach $number (@numbers){
    if (@name[$number]){
        say @name[$number];
    }
}

#! /usr/bin/perl
use v5.10;

@list = <STDIN>;

@sorted = sort @list;

say @sorted;
chomp(@sorted);
say @sorted;

Ch4 子程序

Perl语言支持用户自定义的子程序：

#! /usr/bin/perl

use v5.10;

sub marine {
 $n += 1; # Global variable $n
 print "Hello, sailor number $n!\n";
}

&marine; # says Hello, sailor number 1!
&marine; # says Hello, sailor number 2!
&marine; # says Hello, sailor number 3!
&marine; # says Hello, sailor number 4!

sub sum_of_fred_and_barney {
 print "Hey, you called the sum_of_fred_and_barney subroutine!\n";
 $fred + $barney; # That's the return value
}

$fred = 3;
$barney = 4;
$wilma = &sum_of_fred_and_barney; # $wilma gets 7
print "\$wilma is $wilma.\n";
$betty = 3 * &sum_of_fred_and_barney; # $betty gets 21
print "\$betty is $betty.\n";

上述程序中，sub用于定义函数名。同时我们可以注意到，如果不加限定，子程序中定义的变量是全局变量。

此外，Perl的所有函数都默认含有返回值。并且返回值是子程序中最后一次进行解析的元素。

需要注意的是，返回值一定是最后一次进行解析的元素，如果最后一行语句是类似Print这样的调用，那么返回值为1.

下面的例子中：

sub larger_of_fred_or_barney {
 if ($fred > $barney) {
 $fred;
 } else {
 $barney;
 }
}

返回的是比较操作的最大值，而不是最后一行。

变量传递

调用函数时，很多场景下需要传递变量。

#! /usr/bin/perl

use v5.10;

sub max {
 # Compare this to &larger_of_fred_or_barney
if ($_[0] > $_[1]) {
    $_[0];}
else {
    $_[1];}
}

say &max(7,-9,8);
say &max;

在子程序中，默认调用$_作为参数数组，这个数字是一个局部变量。此外，Perl并不会检查调用时传递的参数是否符合设计时的预期，即使传递多余的变量或空的变量数组，子程序也能运行。

子程序中的私有变量

如前所述，perl中的任何变量都默认为全局变量，而my关键字可以定义局部变量，也称词法变量(lexical variables)。局部变量被限制在最近封闭块{}或文件中。

sub max {
 my($m, $n); # new, private variables for this block
 ($m, $n) = @_; # give names to the parameters
 if ($m > $n) { $m } else { $n }
}

由于Perl的设计原则是，如无必要，不设限制，因此函数的变量列表是十分自由的，一个更好的Max函数设计如下：

#! /usr/bin/perl

use v5.10;

$maximum = &max(3, 5, 10, 4, 6);
say $maximum;

sub max {
 my($max_so_far) = shift @_; # the first one is the largest yet seen
 foreach (@_) { # look at the remaining arguments
 if ($_ > $max_so_far) { # could this one be bigger yet?
 $max_so_far = $_;
 }
 }
 $max_so_far;
}

同时我们可以注意到，子程序不一定要先定义后调用，而是可以在调用之后定义。

此外需要注意的是，my关键字通常只定义它后面的一个变量，除非加上括号：

1 2	my $fred, $barney; # WRONG! Fails to declare $barney my($fred, $barney); # declares both

如果没有{}这样的封闭块，那么文件本身被看作一个封闭块。不同文件中的同名变量是不同的局部变量。

use strict 编译指令

由于Perl的语法限制过于宽松，有时可能会导致大型程序的错误难以定位。此时可以通过use strict关键字来使语法更加严谨。打开strict后，任何变量在使用前都需要先进行my关键字的定义。

#! /usr/bin/perl
#use strict;
use v5.12;
my $bamm_bamm = 3; # New lexical variable

$bammbamm += 1; # No such variable: Compile time fatal error

使用use v5.12和使用use strict的效果相同。

当程序规模变大时，最后使用strict。一般来讲，任何超过一个屏幕长的程序都应该使用strict.

返回 return

在某些情况下需要提前终止子程序，这是就用到了关键字return：

#! /usr/bin/perl
#use strict;
use v5.12;
my @names = qw/ fred barney betty dino wilma pebbles bamm-bamm /;
my $result = &which_element_is("dino", @names);

sub which_element_is {
    my($what, @array) = @_;
    foreach (0..$#array) { # indices of @array's elements
        if ($what eq $array[$_]) {
            return $_; # return early once found
        }
    }
-1; # element not found (return is optional here)
}

省略&符号

一些情况下在函数调用时可以省略前面的&符号。

然而，如果函数的命名正好与Perl的内置函数同名，那么调用时必须加上&符号，否则无法区分自己定义的函数和内置函数，而Perl会默认调用内置函数。

非标量输出

如果在一些情况下，函数需要返回一个非标量的输出，即，一个列表。
如果在列表上下文中，这种操作是完全没问题的。例如如果想要改进..操作，使其能够产生递减的列表：

#! /usr/bin/perl
use v5.10;
sub list_from_fred_to_barney {
 if ($fred < $barney) {
 # Count upwards from $fred to $barney
 $fred..$barney;
 } else {
 # Count downwards from $fred to $barney
 reverse $barney..$fred;
 }
}
$fred = 11;
$barney = 6;
@c = &list_from_fred_to_barney; # @c gets (11, 10, 9, 8, 7, 6)
say @c;

持续变量

一个使用my关键字定义的变量是局部变量，且在每次调用时都被重置。如果想维护一个持续的局部变量，则可以使用state关键字：

#! /usr/bin/perl

use v5.10;
running_sum( 5, 6 );
running_sum( 1..3 );
running_sum( 4 );
sub running_sum {
 state $sum = 0;
 state @numbers;
 foreach my $number ( @_ ) {
 push @numbers, $number;
 $sum += $number;
 }
 say "The sum of (@numbers) is $sum";
 }

子函数签名subroutine signatures

在perl的5.20以后版本，引入了该特性()。使得函数可以像C语言那样定义，同时为变量设置默认值：

#! /usr/bin/perl

use v5.20;
use strict;
use warnings;
use experimental 'signatures';  # 启用子程序签名功能
sub list_from_fred_to_barney (  $fred = 0,  $barney = 7 ) {
    if ($fred < $barney) { $fred..$barney }
    else { reverse $barney..$fred }
}
my @defaults = list_from_fred_to_barney();
my @default_end = list_from_fred_to_barney( 17 );
say "defaults: @defaults";
say "default_end: @default_end";

习题

#! /usr/bin/perl

sub total{
    my $sum;
    foreach (@_){
        $sum += $_;
    }
    $sum;
}

my @fred = qw{ 1 3 5 7 9 };
my $fred_total = total(@fred);
print "The total of \@fred is $fred_total.\n";
print "Enter some numbers on separate lines: ";
my $user_total = total(<STDIN>);
print "The total of those numbers is $user_total.\n";

#! /usr/bin/perl
use v5.10;

sub total{
    my $sum;
    foreach (@_){
        $sum += $_;
    }
    $sum;
}

say total(1..1000);

#! /usr/bin/perl
use v5.10;


sub average{
    my $sum;
    foreach(@_){
        $sum += $_;
    }
    $sum/@_;
}

sub above_average{
    my @above;
    my $average = average(@_);
    foreach(@_){
        if($_ > $average){
             push @above,$_;
        }
    }
    @above;
}

my @fred = above_average(1..10);
print "\@fred is @fred\n";
print "(Should be 6 7 8 9 10)\n";
my @barney = above_average(100, 1..10);
print "\@barney is @barney\n";
print "(Should be just 100)\n";

#! /usr/bin/perl
use v5.10;

sub greet{
    state $here;
    if ($here){
        say "Hi @_[0]! $here is also here!";
    }
    else{
        say "Hi @_[0]! You are the first one here!";
    }
    $here = @_[0];
}

greet( "Fred" );
greet( "Barney" );
greet( "Wilma" );
greet( "Betty" );

#! /usr/bin/perl
use v5.10;

sub greet{
    state @here;
    if (@here){
        say "Hi @_[0]! I've seen: @here";
    }
    else{
        say "Hi @_[0]! You are the first one here!";
    }
    push @here,   @_[0];
}

greet( "Fred" );
greet( "Barney" );
greet( "Wilma" );
greet( "Betty" );

Ch5 输入/输出

标准输入流

如前所述，标准输入流为<STDIN>。

#! /usr/bin/perl
use v5.10;


while (<STDIN>) {
 print "I saw $_";
}

foreach (<STDIN>) {
 print "I saw $_";
}

上面的2种循环的执行逻辑是不同的，while是标量上下文，而foreach是列表上下文：

钻石符号<>

两个尖括号<>称作钻石符号，同样也用于标准输入。

#! /usr/bin/perl
use v5.10;

while (<>) {
 chomp;
 print "It was $_ that I saw!\n";
}

钻石符号的作用在于可以指定调用参数，从文件中读取标准输入。

一般而言，钻石符号在程序中使用一次就够了，其他多余的调用往往意味着错误。

双钻石符号 <<>>

单钻石符号的问题在于，在输入参数含义特殊字符时可能出问题。

例如会将|操作符当作Pipe处理：

而5.22之后的版本使用双钻石符号修复该问题：

use v5.22;
while (<<>>) {
 chomp;
 print "It was $_ that I saw!\n";
}

调用参数列表

perl中的调用参数存储在一个内置列表@ARGV中。可以对上述列表进行单独操作。

#! /usr/bin/perl
use v5.22;

pop @ARGV;
say "The invocation augments are: @ARGV";

while (<<>>) {
 chomp;
 print "It was $_ that I saw!\n";
}

标准输出

print函数被用作标准输出。

在对列表进行print时，有直接print和插值print2种不同方式：

#! /usr/bin/perl
use v5.10;

$" = ',';
@array =  ( "fred\n","barney\n","betty\n"   );

print @array; # print a list of items
print "@array"; # print a string (containing an interpolated array)

区别在于，直接print会将列表合并成一整个字符串进行输出，而插值会在每个列表元素之间插入一个指定字符后输出。

需要插值的字符串保存在Perl的内置变量$"中，默认为一个空格。

括号

在不影响上下文的情况下perl中的括号都可以省略。

然而，有些情况下，省略括号会得到奇怪的结果：

#! /usr/bin/perl
use v5.10;

say (2+3)*4; # Oops!

say (say (2+3)*4);

(say (2+3)*4);

say( (2+3) * 4 );

上述程序输出如下结果：

第一行输出5，这是因为perl会先计算2+3，然后执行say,得到5，同时返回结果为1，将1乘4得到4，但是并没有对4进行任何后续操作。

第二行输出5 4，验证了这一结果。

同理，第三行输出5，只有添加了正确的括号才得到预期的结果。

格式化输出printf

perl提供了和C语言相同的格式化输出方式printf:

printf "Hello, %s; your password expires in %d days!\n",$user, $days_to_die;
printf "%g %g %g\n", 5/2, 51/17, 51 ** 17; # 2.5 3 1.0683e+29
printf "in %x days!\n", 17; # in 0x11 days!
printf "in %o days!\n", 17; # in 021 days!
printf "%10s\n", "wilma"; # looks like `````wilma
printf "%12f\n", 6 * 7 + 2/3; # looks like ```42.666667
printf "%12.3f\n", 6 * 7 + 2/3; # looks like ``````42.667
printf "%12.0f\n", 6 * 7 + 2/3; # looks like ``````````43

其中%g会自动选择较为合适的输出格式。

列表与printf

对于一个不定长的列表，使用格式化输出可能比较困难，但是借助perl的标量上下文特性可以轻松做到：

#! /usr/bin/perl
use v5.10;
my @items = qw( wilma dino pebbles );
printf "The items are:\n".("%10s\n" x @items), @items;

文件句柄 filehandles

perl 通过文件句柄方式来操作文件：

open CONFIG, 'dino';
open CONFIG, '<dino';
open BEDROCK, '>fred';
open LOG, '>>logfile';

为了不与一些内建函数冲突，最好是使用全大写的方式命名文件句柄。

在perl中，仅有6个内建的大写变量：STDIN, STDOUT, STDERR, DATA, ARGV, and ARGVOUT

打开文件的方式分为：

< 读取
> 写入
>>追加写入

在5.6之后的版本也支持：

1
2
3

open CONFIG, '<', 'dino';
open BEDROCK, '>', $file_name;
open LOG, '>>', &logfile_name();

这么做的的好吃是，对文件的操作方式更加明确，并且可以指定文件的编码类型：

1 2	open BEDROCK, '>:encoding(UTF-8)', $file_name; open LOG, '>>:encoding(UTF-8)', &logfile_name();

perl -MEncode -le "print for Encode->encodings(':all')"可以打印出所有的编码方式：

对于打开遇到错误的文件句柄，会被设置为undef。

二进制格式binmode

假设我们有一个脚本需要将数据写入标准输出和标准错误输出，并且我们希望数据保持原样，不进行任何换行符转换。

use strict;
use warnings;

# 将 STDOUT 和 STDERR 切换到二进制模式
binmode STDOUT;  # don't translate line endings
binmode STDERR;  # don't translate line endings

# 输出一些数据
print STDOUT "Hello, World!\n";
print STDERR "Error: Something went wrong!\n";

在没有 binmode STDOUT; 和 binmode STDERR; 的情况下，在 Windows 系统上运行这个脚本可能会将 \n 转换为 \r\n，导致输出中有额外的回车符。而使用了 binmode 后，这些换行符将保持不变，确保输出的一致性。

假设我们需要从一个 UTF-8 编码的文件中读取数据，并正确处理其中的 Unicode 字符。
首先，我们创建一个包含 Unicode 字符的文件 input.txt：

1 2	你好，世界！ Hello, World!

然后，我们编写一个 Perl 脚本来读取并输出这个文件的内容：

#! /usr/bin/perl
use strict;
use warnings;
use utf8;

# 设置输入和输出的编码
binmode STDOUT, ':encoding(UTF-8)';  # 设置标准输出为 UTF-8 编码
binmode STDERR, ':encoding(UTF-8)';  # 设置标准错误输出为 UTF-8 编码
binmode STDIN, ':encoding(UTF-8)';   # 设置标准输入为 UTF-8 编码

my $fh;

# 打开输入文件
if (! (open my $fh, '<:encoding(UTF-8)', 'input.txt') ){
     die "Cannot open file: $!";
}
# 读取文件内容并处理
while (my $line = <$fh>) {
    if ($line =~ /Hello/) {
        print STDOUT "Found: $line";
    } else {
        print STDERR "Not found: $line";
    }
}

close $fh;

在这个脚本中，我们对标准输入、标准输出和标准错误输出都设置了 UTF-8 编码。这样我们可以确保：
从文件中读取的数据是正确解码的 Unicode 字符。
输出的数据也会被正确编码为 UTF-8，确保在不同平台上输出的一致性。

结束句柄

如上所述，每个open 操作最好带有一个close操作。

die用于处理致命错误

如上述程序所示，可以用die来处理致命错误。系统的报错信息和行号会被自动存储在$!变量中。

此外可以把die换成warn函数来生成警告，区别在于不会结束进程而是打印警告信息。

此外在5.10后的版本还可以通过添加use autodie；来使程序自动退出，从而省去了检查。

文件句柄的使用

文件句柄的使用，可以将其放入尖括号中：

if ( ! open PASSWD, "/etc/passwd") {
 die "How did you get logged in? ($!)";
}
while (<PASSWD>) {
 chomp;
 ...
}

此外，如果是以写入方式打开的句柄，可以通过在printf 后直接写上文件句柄的方式来将其写入到对应文件：

#! /usr/bin/perl
use v5.24;
use autodie;

open LOG, '>>', 'logfile';

print LOG "Captain's log, stardate 3.14159\n"; # output goes to LOG

select LOG;
printf "I hope Mr. Slate doesn't find out about this.\n";
print "Wilma!\n";

如果不指定print或printf的输出句柄，其默认会输出到STDOUT，也就是用户的显示屏。

然而可以通过select的方式指定输出的位置，从而改变输出目的地。

重新打开文件句柄

如果重新打开一个文件句柄，Perl会先关闭前一个，然后打开一个新的，因此，可以重新指定perl的标准文件句柄，从而使一些信息输出到指定的地方。

# Send errors to my private error log
if ( ! open STDERR, ">>/home/barney/.error_log") {
 die "Can't open error log for append: $!";
}

标量变量的句柄

perl支持将文件句柄保存在标量变量中：

#! /usr/bin/perl
use v5.24;
use autodie;


open my $rocks_fh, '>>', 'rocks.txt'
  or die "Could not open rocks.txt: $!";
foreach my $rock ( qw( slate lava granite ) ) {
    say $rocks_fh $rock
}
print $rocks_fh "limestone\n";
close $rocks_fh;

将句柄保存为标量，好处是可以用函数对其进行处理，同时作用域更加明确。

此外注意say和print的位置是没有逗号的，因为执行的是文件写入操作。

如果将文件句柄当作标量打印，得到的结果会很奇怪。

解决方案是将文件句柄用花括号括起来(创建一个独立的域)。

1 2	print { $rocks_fh } "sandstone\n"; print {$rocks_fh} $_;

习题

#! /usr/bin/perl
use v5.22;

my @stack;
while (<>){
    push @stack,$_;
    chomp(@stack)
}

my $temp = pop @stack;
while (defined($temp)){
    say $temp;
    $temp = pop @stack;
}

#! /usr/bin/perl
use v5.22;

chomp(my @list=<STDIN>);

say ((0..9)x6);
printf "%20s\n"x@list,@list;

#! /usr/bin/perl
use v5.22;

say "Please input your length:";
my $length = <STDIN>;
say "Please input your strings:";
chomp(my @list = <STDIN>);

say (0..$length);

foreach (@list){
    say ' 'x($length-length($_)).$_;
}

Ch6 哈希

概念

哈希即一系列key-value的组合，可以通过key来管理、索引数据。

哈希中的数据通常是无序的，可以看作是一个桶中放着带有不同标签的元素。

#! /usr/bin/perl
use v5.22;

my %family_name;
$family_name{'fred'} = 'flintstone';
$family_name{'barney'} = 'rubble';

foreach my $person (qw< barney fred >) {
 print "I've heard of $person $family_name{$person}.\n";
} 

my %some_hash = ('foo', 35, 'bar', 12.4, 2.5, 'hello',
 'wilma', 1.72e30, 'betty', "bye\n");

my @array = %some_hash;
say  $some_hash{2.5};
print @array;

哈希变量用百分号%来定义。用大括号{}来索引。哈希变量可以展开赋值给列表，列表中元素的顺序可能与定义时不同，这是因为perl为了索引将元素进行了重新排序。

可以使用类似

1	%ip_address = reverse %host_name;

的操作，将键值对进行反转，通过原来的值来查找对应的键。这么做的前提是所有键值对是一一对应的。如果有重复的则会进行覆盖。

箭头符号 =>

perl专门为哈希定义了一个符号=>用于定义哈希表。
在使用箭头符号进行定义时，可以省略字符串两边的引号。

my %last_name = (
 fred => 'flintstone',
 dino => undef,
 barney => 'rubble',
 betty => 'rubble',
);

say;
say $last_name{fred};

哈希函数

perl内建了很多函数来对哈希进行处理。

keys 和 values

keys 和values会分别返回哈希表的键和值的列表：

#! /usr/bin/perl
use v5.22;

my %hash = ('a' => 1, 'b' => 2, 'c' => 3);
my @k = keys %hash;
my @v = values %hash;
my $count = keys %hash; # gets 3, meaning three key-value pairs

say @k;
say @v;
say $count;

在标量上下文中，2个函数都会直接返回元素的个数。

each函数

each函数同样用在while循环中，来对哈希表的所有key-value对进行遍历。

#! /usr/bin/perl
use v5.22;


my %hash = ('a' => 1, 'b' => 2, 'c' => 3);

while ( my ($key, $value) = each %hash ) {
 print "$key => $value\n";
}
say '--------------';
foreach my $key (sort keys %hash) {
    my $value = $hash{$key};
    print "$key => $value\n";
    # Or, we could have avoided the extra $value variable:
    # print "$key => $hash{$key}\n";
}

由于哈希元素在内部的排序是无序的(或者说安装Perl内部的逻辑进行的)，可能与我们的预期不同，因此需要可能需要人工进行排序。

exist, delete和内插

#! /usr/bin/perl
use v5.22;

my %books;

$books{'fred'} = 3;
$books{'wilma'} = 1;
$books{"barney"} = 0; # no books currently checked out
$books{"pebbles"} = undef; # no books EVER checked out; a new library card

if (exists $books{"dino"}) {
    print "Hey, there's a library card for dino!\n";
}
else {
    print "dino has no library card!\n";
}

my $person = "barney";
delete $books{$person}; 

foreach my $person (sort keys %books) { # each patron, in order
 if ($books{$person}) {
 print "$person has $books{$person} items\n"; # fred has 3 items
 }
}

exist 用于检查哈希表中元素是否存在，delete用于根据键来删除对应的键值对。

哈希变量同理可以内插进字符串中。

环境变量

perl通过%ENV这个特殊的哈希来访问系统的环境变量，只要在外部环境中设置过的变量，都可以通过该哈希进行访问：

#! /usr/bin/perl
use v5.22;

say "Path is $ENV{PATH}";
say "Display is $ENV{DISPLAY}";

习题

#! /usr/bin/perl
use v5.22;

my %namelist = (
    fred=>"flintstone",
    barney=>"rubble",
    wilma=>"flintstone"
);


say "input the name you want to know:";
while (chomp(my $name = <STDIN>)){
    if (exists $namelist{$name}){
        say "$name 's family name is $namelist{$name}";
    }
    else{
        say "sorry, I don't konw a person named $name";
    }
}

#! /usr/bin/perl
use v5.22;

my %wordlist;

while(<>){
    if (exists $wordlist{$_}){
        $wordlist{$_}+=1;
    }
    else {
        $wordlist{$_}=1;
    }
}

foreach ( sort keys %wordlist){
    say "the word $_ appears $wordlist{$_} times";
}

#! /usr/bin/perl
use v5.22;


say ((" "x7)."KEY".(" "x85)."VALUE");
foreach my $key (sort keys %ENV){
     printf "%10s          %90s\n",$key,$ENV{$key};
}

Ch7 漫游正则表达式

正则表达式的概念

正则表达式用来检查一串字符串匹配或不匹配某种模式。

对默认变量$_进行正则匹配，只需要将正则表达式写入斜线//内：

#! /usr/bin/perl
use v5.22;


$_ = "yabba dabba doo";
if (/ba da/) { # Does match
 print "It matched!\n";
}

而对于一般的变量，则使用=~进行匹配：

1	$string =~ /pattern/;

perl的正则匹配默认是left most的，它会从左往右检查字符串，遇到第一个匹配的模式直接返回结果。

此外，正则表达式是大小写敏感的：

#! /usr/bin/perl
use v5.22;

while( <STDIN> ) {
 chomp;
 if ( /fred/ ) {
 print "\tMatches\n";
 }
 else {
 print "\tDoesn't match\n";
 }
}

通配

.符号可以匹配除了换行符以外的任何单个字符。

如果需要匹配一个真正的点号，则需要用反斜杠进行转义：\.

同理为了匹配反斜杠，需要转义反斜杠本身：\\。

\N也能达到和点号相同的效果。

量词 `*`,`?`, `+`，`{m,n}`

?表示前面的符号出现1次或0次。
*表示前面的符号出现0次或多次。

.*星号与点号搭配，即可表示出现0次或任意次数的任意字符，即可以匹配除了换号外的任意字符串。

#! /usr/bin/perl
use v5.22;

while( <STDIN> ) {
 chomp;
 if ( /.*/ ) {
 print "\tMatches\n";
 }
 else {
 print "\tDoesn't match\n";
 }
}

同时，这种匹配是贪婪的，它会匹配尽量长的部分。

由于星号可以匹配0次或多次，因此.*可以匹配空字符。

与之不同的是+号，可以匹配1次或多次。

想要匹配特定次数，则可以使用数字加大括号的组合，例如：

$_ = "yabbbba dabbba doo.";
if (/ab{3}a/) {
 print "It matched!\n";
}

同理可以通过大括号规定重复次数的区间：

$_ = "yabbbba dabbba doo.";
if (/ab{2,3}a/) {
 print "It matched!\n";
}

上述代码表示b出现2到3次，也可以只规定上界或下界中的一个，另一个自动设置为0和无穷。

$_ = "yabbbba dabbba doo.";
if (/ab{3,}a/) {
 print "It matched!\n";
}

关于量词的总结如下：

模式分组 `()`与反向引用`g{}`

量词只能作用于它前面一项，如果想让量词作用于多个字符，则可以通过小括号()对量词进行模式分组。

通过反斜杠加数字的办法可以和括号配合实现反向引用\1。反向引用是指前面的括号中匹配到的模式。例如：

$_ = "abba";
if (/(.)\1/) { # matches 'bb'
 print "It matched same character next to itself!\n";
}

可以匹配任何一个重复2次的相同字符。

反向引用不必紧跟着括号：

1
2
3

$_ = "yabba dabba doo";
if (/y(....) d\1/) {
 print "It matched the same after y and d!\n";

在这里反斜杠后面的数字代表的是被匹配到的模式序号，而非重复的次数。被匹配到的顺序是按照左括号来计算的：

$_ = "yabba dabba doo";
if (/y((.)(.)\3\2) d\1/) {
 print "It matched!\n";
}

上述表达的问题在于可能导致歧义：\111是代表第111个反向引用，还是第11个反向引用后加上数字1，还是第一个反向引用加上数字11？

perl会认为是第111个反向引用。

perl提供了一种消除歧义的方法是\g{N}来表示反向引用。

同时，引用的编号也可以为负数，相对反向引用，即基于当前位置向前计数得到的捕获模式：

use v5.10;
my $str = "abcabc";
if ($str =~ /(a)(b)(c)\g{-3}\g{-2}\g{-1}/) {
    print "Match found!\n";
}

选择符 `|`

竖线代表或操作，左右2边的字符串将会分别参与匹配。

foreach ( qw(fred betty barney dino) ) {
 if ( /fred|barney/ ) {
 print "$_ matched\n";
 }
}

由于竖线会把前后当作2个独立的字符串进行处理，因此有时需要配合括号使用：

$_ = "Fred Rubble";
if( /Fred|Wilma Flintstone/ ) { # unexpectedly matches
 print "It matched!\n";
}

改为：

$_ = "Fred Rubble";
if( /(Fred|Wilma) Flintstone/ ) { # doesn't match
 print "It matched!\n";
}

字符集`[]`

方括号中的单个字符会被逐一匹配。

$_ = "Bamm-Bamm";
if (/Bamm-?[Bb]amm/) {
 print "The string has Bamm-Bamm\n";
}

此外可以用^符号进行补集操作，可以匹配除了方括号中字符的其他所有字符。

字符集的简写

对于一些高频出现的字符集，正则表达式为其设计了简写模式：

Unicode 特性

对于一些Unicode字符集，perl使用\p{}的方式进行描述，详见
https://perldoc.perl.org/perluniprops

锚点Anchor

\A用于强调必须从字符串开头进行匹配，\z强调必须匹配词尾。
\Z强调除了必须匹配词尾以外，还可以加上一个空行符或不加。

词锚 word anchor

\b用于锚定单词的词尾。例如

/\bfred\b/可以匹配fred本身，而不会匹配Alfred 或fredstone。

\b可以锚定所有\w可以表示的字符的结尾，即数字，字母和下划线。而\B锚定所有\W可以表示的字符的结尾，即上述内容的补集。

习题

#!/usr/bin/perl

use v5.22;


while (<>){
    chomp;
    if (/fred/){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

#!/usr/bin/perl

use v5.22;

while (<>){
    chomp;
    if (/[Ff]red/){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

#!/usr/bin/perl

use v5.22;

while (<>){
    chomp;
    if (/\./){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

#!/usr/bin/perl

use v5.22;

while (<>){
    chomp;
    if (/\A[A-Z]+[a-z]+/){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

#!/usr/bin/perl

use v5.22;

while (<>){
    chomp;
    if (/(.+)\g{-1}/){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

#!/usr/bin/perl

use v5.22;

while (<>){
    chomp;
    if (/fred/&&/wilma/){
        say "$_ match";
    }
    else {
        say "$_ don't match!";
    }
}

Ch8 正则表达式进行匹配

perl用m来进行匹配操作，之前的双斜杠//写法其实是匹配的简写。m像qw一样，可以搭配任意成对的括号使用。

例如m(fred), m\<fred>, m{fred}, or m[fred]

`/i`进行大小写无关的匹配

当使用i进行修饰时，会忽略字符串的大小写区别。

print "Would you like to play a game? ";
chomp($_ = <STDIN>);
if (/yes/i) { # case-insensitive match
 print "In that case, I recommend that you go bowling.\n";
}

注意修饰符是放在外面的，和被引用的正则表达式不再同一个括号里。因此同样可以写成：

#!/usr/bin/perl

use v5.22;

print "Would you like to play a game? ";
chomp($_ = <STDIN>);
if (m{yes}i) { # case-insensitive match
 print "In that case, I recommend that you go bowling.\n";
}

`/s`进行任意字符的匹配

由于.*可以匹配除了换行符以外的任意字符串，因此为了处理字符串中的换行符，可以用s进行修饰。

$_ = "I saw Barney\ndown at the bowling alley\nwith Fred\nlast night.\n";
if (/Barney.*Fred/s) {
 print "That string mentions Fred after Barney!\n";
}

\s的修饰会让所有.变得能匹配换行符。如果想让表达式中的个别.保持对换行符的不匹配，可以换成[^\n]或\N。

`/x`加入空白符

使用x修饰可以使得表达式中的变量，使得其中的空白字符可以被忽略。这样做可以增加正则表达式的可读性。

1 2	/-?[0-9]+\.?[0-9]/ # what is this doing? / -? [0-9]+ \.? [0-9] /x # a little better

修饰符号的组合

可以将多个修饰符号组合使用，只需要将他们一起放在正则表达式的末尾：

1
2
3

if (/barney.*fred/is) { # both /i and /s
 print "That string mentions Fred after Barney!\n";
}

if (m{
 barney # the little guy
 .* # anything in between
 fred # the loud guy
}six) { # all three of /s and /i and /x
 print "That string mentions Fred after Barney!\n";
}

选择一种字符的解释方式

/a用于表示ascii,/u用于表示unicode，/l表示Local，会选择本地的字符集进行匹配。

`\m`进行行匹配

使用m配合词首锚位^和词尾锚位$,可以实现对各行的匹配而非单整个字符串的匹配。


#!/usr/bin/perl

use v5.22;

$_ = 'This is a wilma line
barney is on another line
but this ends in fred
and a final dino line';

if(/fred$/m){
    say "match success!";
}

同样的，如果不想开启多行匹配，最好还是使用\A和\Z。

其他操作

绑定操作符 `=~`

绑定操作符 =~会高速perl用右边的正则表达式来匹配左边的字符串。例如：

my $some_other = "I dream of betty rubble.";
if ($some_other =~ /\brub/) {
    print "Aye, there's the rub.\n";
}

捕获变量

perl 用$1,$2这样的变量来描述正则表达式捕获的变量。每个括号捕获到的事物会被像这样编号。注意捕获变量和反向引用的区别：

#!/usr/bin/perl

use v5.22;


$_ = "Hello there, neighbor";
if (/\s([a-zA-Z]+),/) { # capture the word between space and comma
 print "the word was $1\n"; # the word was there
}

$_ = "Hello there, neighbor";
if (/(\S+) (\S+), (\S+)/) {
 print "words were $1 $2 $3\n";
}

被捕获的字符串也可以为空字符串，注意空字符串和undef是不同的。

捕获变量的存续期

当一个新的匹配发生时，捕获变量会覆盖之前捕获的值，而未发生捕获时，上一个值将一直存在。

不捕获圆括号 `(?:)`

在左括号前面加上问号和冒号，可以让圆括号只用来分组而不用来捕获。

例如：

1
2
3

if (/(bronto)?saurus (steak|burger)/) {
 print "Fred wants a $2\n";
}

前面的bronto不管有没有，都会占用捕获变量$1的位置，而加上相应的不捕获符号后，可以忽略第一个括号的内容：

1
2
3

if (/(?:bronto)?saurus (steak|burger)/) {
 print "Fred wants a $1\n";
}

命名捕获

如果有需要修改正则表达式的情况，添加一个括号有可能使整个正则表达式乱套，出来提供不捕获标记?:的写法之外，perl还支持对捕获变量之间命名，方便记忆。

使用 (?<LABEL>PATTERN)的写法来为捕获变量命名

之后就可以使用$+{name1}和$+{name2}的形式来引用这些捕获的变量。

use v5.10;
 my $names = 'Fred or Barney';
 if ( $names =~ m/((?<name2>\w+) (and|or) (?<name1>\w+))/ ) {
  say "I saw $+{name1} and $+{name2}";
 }

捕获之后，可是使用\g{label}和\k<label>这样的写法来对其进行反向引用。

use v5.10;
my $names = 'Fred Flintstone and Wilma Flintstone';
if ( $names =~ m/(?<last_name>\w+) and \w+ \k<last_name>/ ) {
 say "I saw $+{last_name}";
}

自动捕获变量

5.10版本以后，perl用${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}这三个变量来表示正则表达式中的自动捕获变量。

#!/usr/bin/perl

 use v5.10;
 if ("Hello there, neighbor" =~ /\s(\w+),/p) {
  print "That actually matched '${^MATCH}'.\n";
 }

 if ("Hello there, neighbor" =~ /\s(\w+),/p) {
  print "That was (${^PREMATCH})(${^MATCH})(${^POSTMATCH}).\n";
 }

优先级

各类匹配变量的优先级如下表：

测试程序

一个简单的测试正则表达式匹配结果的程序：

while (<>) {           # take one input line at a time
 chomp;
 if (/YOUR_PATTERN_GOES_HERE/) {
   print "Matched: |$`<$&>$'|\n";  # the special match vars
 } else {
   print "No match: |$_|\n";
 }
}

习题

#!/usr/bin/perl

use v5.22;


while (<>) {           # take one input line at a time
  chomp;
  if (/match/) {
    print "Matched: |$`<$&>$'|\n";  # the special match vars
  } else {
    print "No match: |$_|\n";
  }
 }

#!/usr/bin/perl

use v5.22;

while (<>) {           # take one input line at a time
  chomp;
  if (/\w*a\s/) {
    print "Matched: |$`<$&>$'|\n";  # the special match vars
  } else {
    print "No match: |$_|\n";
  }
 }

#!/usr/bin/perl

use v5.22;

while (<>) {           # take one input line at a time
  chomp;
  if (/(\w*a)\s/) {
    print "\$1 contains  $1\n";  # the special match vars
  } else {
    print "No match: |$_|\n";
  }
 }

#!/usr/bin/perl

use v5.22;

while (<>) {           # take one input line at a time
  chomp;
  if (/(?<words>\w*a)\s/) {
    print "'words' contains  $+{words}\n";  # the special match vars
  } else {
    print "No match: |$_|\n";
  }
 }

#!/usr/bin/perl

use v5.22;

while (<>) {           # take one input line at a time
  chomp;
  if (/(?<words>\w*a)(?<appends>(\W|\s)\w{0,4})/) {
    print "'words' contains  $+{words}, followed by $+{appends}\n";  # the special match vars
  } else {
    print "No match: |$_|\n";
  }
 }

#!/usr/bin/perl

use v5.22;

while (<>) {           # take one input line at a time
  if (/([ \t]$)/) {
    print "$_ match...\n";  # the special match vars
  } else {
    print "No match: $_\n";
  }
 }

Ch9 用正则表达式处理文件

用`s///`进行替换

如果把m//看作是单纯的查找操作，那么s///则是匹配后替换的操作。前面是要匹配的表达式，后面是需要替换的表达式。

如：

#!/usr/bin/perl

use v5.22;

 $_ = "He's out bowling with Barney tonight.";
 s/Barney/Fred/;  # Replace Barney with Fred
 print "$_\n";

另外，如果匹配失败，则什么都不会发生。

表达式本身具有返回值，如果替换成功会返回true:

$_ = "fred flintstone";
if (s/fred/wilma/) {
 print "Successfully replaced fred with wilma!\n";
}

使用`g`进行全局替换

s模式默认只替换匹配到的第一个模式。想要进行全局替换，需要用到g,即global模式：

1
2
3

$_ = "home, sweet home!";
s/home/cave/g;
print "$_\n";  # "cave, sweet cave!"

g模式会进行全局的，不充分的替换。

一个常用的场景是去除字符串中多余的空格：

1 2	$_ = "Input data\t may have extra whitespace."; s/\s+/ /g; # Now it says "Input data may have extra whitespace."

使用不同的界定符

就像qw可以使用不同的界定符一样，匹配和替换也可以使用不同符号。区别在于，成对的符号必须左右匹配，而相同的界定符只需要出现3个：

s#\Ahttps://#http://#;
...
 s{fred}{barney};
 s[fred](barney);
 s<fred>#barney#;

替换修饰符

在替换时可以添加/i, /x, m/, /s等修饰符，进行匹配修饰。

例如：

1 2	s#wilma#Wilma#gi; # replace every WiLmA or WILMA with Wilma s{__END__.*}{}s; # chop off the end marker and all following lines

绑定操作符

用法与m时一样，可以针对变量使用：

1	$file_name =~ s#\A.*/##s;

如果表示一个变量不能匹配一个正则表达式，使用符号!~。

无损替换

如果想保留替换前的字符串，可以先复制一份：

1
2
3

my $original = 'Fred ate 1 rib';
my $copy = $original;
$copy =~ s/\d+ ribs?/10 ribs/;

等价于：

1	(my $copy = $original) =~ s/\d+ ribs?/10 ribs/;

5.14以后的版本，可以加上一个/r修饰符，这样会将原字符串保持不变，而将替换结果返回：

#!/usr/bin/perl

use v5.22;

my $original = 'Fred ate 1 rib';
my $copy = $original =~ s/\d+ ribs?/10 ribs/r;

say "original:$original";
say "copy:$copy";

大小写转换

\U 可以将其后的所有小写字母替换为大写。

#!/usr/bin/perl

use v5.22;

$_ = "I saw Barney with Fred.";
s/(fred|barney)/\U$1/gi;  # $_ is now "I saw BARNEY with FRED."

say $_;

\L可以将其后的所有字母转换为小写。

1	s/(fred\|barney)/\L$1/gi; # $_ is now "I saw barney with fred."

同时，可以使用\E关闭后续的大小写转换：

1	s/(\w+) with (\w+)/\U$2\E with $1/i; # $_ is now "I saw FRED with barney."

当使用\l和\u时，只会作用于一个字母：

1	s/(fred\|barney)/\u$1/ig; # $_ is now "I saw FRED with Barney."

连续使用\u,\L,可以实现首字母大写，后续字母小写：

1	s/(fred\|barney)/\u\L$1/ig; # $_ is now "I saw Fred with Barney."

这里的大小写转换可以用在任何双引号定义的字符串中：

1	print "Hello, \L\u$name\E, would you like to play a game?\n";

大小写转换被封装为了不同的函数：

my $start   = "Fred";
my $uncapp  = lc( $start );         # fred,用于将字符串中的所有字符转换为小写。它的全称是 "lowercase"。
my $uppered = uc( $uncapp );        # FRED,用于将字符串中的所有字符转换为大写。它的全称是 "uppercase"。
my $lowered = lc( $uppered );       # fred
my $capped  = ucfirst( $lowered );  # Fred,首字母大写
my $folded  = fc( $uncapped );      # fred,fc 函数用于将字符串转换为规范化的小写形式。它的全称是 "foldcase"，主要用于 Unicode 字符串的大小写折叠（大小写无关比较）。

元引用

可以使用\Q使其后面的所有括号都代表其原始字符：
例如如果想去掉字符串中多余的括号：

1
2
3

if ( s/\(\(\(Fred/Fred/ ) {    # Compiles, but messy!
  print "Replaced name\n";
 }

这样写需要多个反引号，显得十分混乱，可以使用\Q进行元引用：

1
2
3

if ( s/\Q(((Fred/Fred/ ) {     # Less messy
 print "Replaced name\n";
}

split操作

split可以根据特定的符号对字符串进行拆分：

 #!/usr/bin/perl

use v5.22;

my @fields = split /:/, "abc:def:g:h"; 

foreach my $temp(@fields){
    say $temp;
}

分隔符中间没有其他字符则被转换为空字符，perl会保留列表开头的空字符，而省略结尾的空字符：

1
2
3

 my @fields = split /:/, ":::a:b:c:::";  # gives ("", "", "", "a", "b", "c")

my @fields = split /:/, ":::a:b:c:::";  # gives ("", "", "", "a", "b", "c")

想要保留末尾的空字符，需要再增加一个参数-1:

1 2	my @fields = split /:/, ":::a:b:c:::", -1; # gives ("", "", "", "a", "b", "c", "", "", "")

一个常见的用法是分割句子中的每个单词：

1 2	my $some_input = "This is a \t test.\n"; my @args = split /\s+/, $some_input; # ("This", "is", "a", "test.")

split会默认使用空格分割$_变量中的字符串：

1	my @fields = split; # like split /\s+/, $_;

join操作

join完成的操作和split相反，将分开的字符串通过符号连接起来：

1	my $result = join $glue, @pieces;

join可看作是胶水操作，第一个参数是具体的连接参数，第二个之后的参数是需要连接的对象。如果第二个参数是一个标量，则只返回第二个变量，而不需要胶水：

1
2
3

my $y = join "foo", "bar";       # gives just "bar", since no foo glue is needed
my @empty;                       # empty array
my $empty = join "baz", @empty;  # no items, so it's an empty string

列表上下文中的匹配

在列表上下文中使用m进行匹配时，如果匹配成功，返回的是所有匹配变量的列表，否则返回空列表。

#!/usr/bin/perl

use v5.22;

my $text = "Fred dropped a 5 ton granite block on Mr. Slate";
 my @words = ($text =~ /([a-z]+)/ig);
 print "Result: @words\n";
 # Result: Fred dropped a ton granite block on Mr Slate

如果匹配标量中包含2个及以上的可捕获变量时，可以将其转换为一个哈希表：

#!/usr/bin/perl

use v5.22;

 my $text = "Barney Rubble Fred Flintstone Wilma Flintstone";
 my %last_name = ($text =~ /(\w+)\s+(\w+)/g);

foreach my $keys ( keys %last_name) { # each patron, in order
 if ($last_name{$keys}) {
 print "$keys => $last_name{$keys}\n"; # fred has 3 items
 }
}

第一个变量被当作键，第二个当作值，如果有第三个捕获变量，将会被忽略。

更强大的正则表达式

非贪婪量词

一般情况下，正则表达式遵循贪婪匹配规则，也就是说会尽可能匹配多的字符，为量词加上?,可以改为非贪婪模式，匹配尽可能少的字符：

#!/usr/bin/perl

use v5.22;

 my $text = '<b>Fred</b> and <b>Barney</b>';
 my $match_count = $text =~ s|<b>(.*)</b>|\U$1|g;
 print "$match_count: $text\n";

 my $text = '<b>Fred</b> and <b>Barney</b>';
 my $match_count = $text =~ s|<b>(.*?)</b>|\U$1|g; # nongreedy
 print "$match_count: $text\n";

上述第一种情况，由于是贪婪匹配，只捕获到了最长的一个子字符，并将其中的所有字母设置为大写。

第二种情况下，.*?改变为非贪婪匹配，得到我们想要的结果。

非贪婪描述符对不同量词的作用效果如下表所示：

更好的单词边界

#!/usr/bin/perl

use v5.22;

my $string = "This doesn't capitalize correctly.";
$string =~ s/\b(\w)/\U$1/g;
print "$string\n";

my $string = "this doesn't capitalize correctly.";
$string =~ s/\b{wb}(\w)/\U$1/g;
print "$string\n";

理论上\b(\w)可以匹配一个单词的开头，但是Perl会把’也匹配到，导致奇怪的效果，使用\b{wb}则可以正确匹配。

类似还有 \b{sb}和 \b{lb}等。

处理多行文本

Perl可以一次读入文本的多行，将换行当作\n处理。使用^可以匹配行首：

#!/usr/bin/perl

use v5.22;

$_ = "I'm much better\nthan Barney is\nat bowling,\nWilma.\n";
print "Found 'wilma' at start of line\n" if /^wilma\b/im;

例如下面的文件在每一行加入文件名信息：

#!/usr/bin/perl

use v5.22;

my $filename = pop @ARGV;
open FILE, $filename
  or die "Can't open '$filename': $!";
my $lines = join '', <FILE>;
$lines =~ s/^/$filename: /gm;

say $lines;
close $filename;

一次更新多个文件

使用如下程序一次更新多个程序：

#!/usr/bin/perl -w

use strict;

chomp(my $date = `date`);
$^I = ".bak";

while (<>) {
   s/\AAuthor:.*/Author: Randal L. Schwartz/;
   s/\APhone:.*\n//;
   s/\ADate:.*/Date: $date/;
   print;
}

如果把$^I设置为空字符，则不会留下任何备份。
(注意空字符不是undef)

通过命令行直接编辑

1	$ perl -p -i.bak -w -e 's/Randall/Randal/g' fred*.dat

-p选项代表

1
2
3

while (<>) {
  print;
 }

-i.bak 表示把$^I设置为.bak, -w表示打开warning，-e代表其后面跟着的是可执行代码，即’s/Randall/Randal/g’，后面的fred*.dat是输入变量列表。

课后习题

#!/usr/bin/perl 

use v5.24;

my $what="fred|barney";
while (<>){
    chomp;
    if (m/($what){3}/ ){
    say "$_ matches $what!";
    }
}

#!/usr/bin/perl 

use v5.24;

$^I = ".out";

while (<>){
    s/fred/larry/ig;
    print;
}

#!/usr/bin/perl 

use v5.24;

$^I = ".out";

while (<>){
    s/fred/FRED_TEMP/ig;
    s/wilma/Fred/ig;
    s/FRED_TEMP/Wilma/g;
    print;
}

#!/usr/bin/perl 

use v5.24;

$^I = "";

while (<>){
    print;
    if (m/(?<shellbang>#!.*\n)/){
        print  "## Copyright (C) 2024 by Wenkai";
    }
}

答案是用了一个哈希表检查对应的文件有没有被处理过，感觉不是很靠谱。
还是应该逐行检查有没有版权声明才对。

#!/usr/bin/perl

use v5.22;

my $filename = pop @ARGV;
say "file: $filename";


open FILE, $filename
  or die "Can't open '$filename': $!";
my $lines = join '', <FILE>;
close FILE;

if ($lines !~ /^## Copyright/m){
    say "success";
    $lines =~ s/^(#!.*\n)/$1## Copyright (C) 2024 by Wenkai/;
    open FILE, '>', $filename or die "Cannot write to $filename: $!";
        print  FILE  $lines;
    close FILE;
}

Ch10 其他控制结构

unless

unless = if not,当括号中的条件为假时，执行相应的操作。

unless可以跟着else语句，相当于if反过来。

unless ($mon =~ /\AFeb/) {
 print "This month has at least thirty days.\n";
} else {
 print "Do you see what's going on here?\n";
}

等价于：

if ($mon =~ /\AFeb/) {
 print "Do you see what's going on here?\n";
} else {
 print "This month has at least thirty days.\n";
}

until结构

当until中的条件为真时，结束循环，相当于while not。

1
2
3

until ($j > $i) {
 $j *= 2;
}

表达式修饰符

可以在一个表达式后面跟上一个修饰符，当该修饰符成立时，表达式才被执行：

1	print "$n is a negative number.\n" if $n < 0;

相当于：

1
2
3

if ($n < 0) {
  print "$n is a negative number.\n";
}

类似的操作还有：

&error("Invalid input") unless &valid($input);
$i *= 2 until $i > $j;
print " ", ($n += 2) while $n < 10;
&greet($_) foreach @person;

裸块控制结构

一个没有关键字和条件表达式的结构称为裸块：

{
    body;
  body;
  body;
}

裸块适用于一些一次性代码，其中的变量会被定义为局部变量。

if elsif结构

if ( ! defined $dino) {
 print "The value is undef.\n";
} elsif ($dino =~ /^-?\d+\.?$/) {
 print "The value is an integer.\n";
} elsif ($dino =~ /^-?\d*\.\d+$/) {
 print "The value is a _simple_ floating-point number.\n";
} elsif ($dino eq '') {
 print "The value is the empty string.\n";
} else {
 print "The value is the string '$dino'.\n";
}

这个不多解释了。

自增与自减

与C语言相同的++和--操作

for 循环

与C语言相同的for循环格式。

1
2
3

for ($i = –150; $i <= 1000; $i += 3) {
  print "$i\n";
 }

for 与foreach的关系

在perl中，这俩其实是等价的。

1
2
3

for (1..10) {  # really a foreach loop from 1 to 10
  print "I can count to $_!\n";
}

循环控制

last操作

相当于C中的break

next操作

相当于C中的continue

redo操作

重复执行当前循环中的操作。

#!/usr/bin/perl
use v5.22;

 # Typing test
 my @words = qw{ fred barney pebbles dino wilma betty };
 my $errors = 0;
 foreach (@words) {
  ## redo comes here ##
  print "Type the word '$_': ";
  chomp(my $try = <STDIN>);
   if ($try ne $_) {
    print "Sorry - That's not right.\n\n";
    $errors++;
    redo;  # jump back up to the top of the loop
  }
 }
 print "You've completed the test, with $errors errors.\n";

redo, last 和next都可以在循环中嵌套使用，并且用于跳出或回到最内层循环。

带标签的块

LINE: while (<>) {
  WORD: foreach (split) {
    last LINE if /__END__/;  # bail out of the LINE loop
    last WORD if /EOL/;      # skip the rest of the line
    ...
  }
 }

label可以用来与last,next和redo配合，实现精确跳转。

条件操作符

1	expression ? if_true_expr : if_false_expr

写verilog的人最喜欢的东西。

逻辑操作符

与C语言中的与&&或||非!相同。

定义或操作符 `\\`

定义或操作符的用途是，当左边的变量为undef时，将右边的值自动赋给左边，解决undef的问题。

#!/usr/bin/perl

use v5.10;
 foreach my $try ( 0, undef, '0', 1, 25 ) {
  print "Trying [$try] ---> ";
  my $value = $try // 'default';
  say "\tgot [$value]";
 }

可以看到，只有undef被设置为了’default’，而数字0和字符串’0’都不受影响。

使用部分求值操作符进行控制

一个典型的例子是文件打开操作：

1 2	open my $fh, '<', $filename or die "Can't open '$filename': $!";

课后习题

#!/usr/bin/perl

use v5.24;
my $magic_number = int(1 + rand 100);
say "please guess a number in range (1,100):";

GUESSAGAIN: {   
        chomp(my $gauss = <STDIN>);
        if ($gauss > $magic_number){
            say "Guess something smaller";
            redo GUESSAGAIN;
        }
        elsif($gauss < $magic_number) {
            say "Guess something bigger";
            redo GUESSAGAIN;
        }
        else{
            say "You are right!";
        }
}

#!/usr/bin/perl

use v5.24;

my $magic_number = int(1 + rand 100);
say "please guess a number in range (1,100):";

GUESSAGAIN: {   
        chomp(my $guess = <STDIN>);
        if ($guess =~ /\D/ && $guess !~ /^q$/ && $guess !~ /^y/){
            say "Wanna some implications? or enter q to quit";
            redo GUESSAGAIN;
        }
        if ($guess =~ /^q$/){
            say "Alright, have a nice day";
            last;
        }
        elsif ($guess =~ /y/){
            say "All right";
            IMPLICATION: {
                    my ($first_char) = $magic_number =~ /^(.).*/;
                    say "The number start with $first_char";
            }
            redo GUESSAGAIN;
        }
        if ($guess > $magic_number){
            say "Guess something smaller";
            redo GUESSAGAIN;
        }
        elsif($guess < $magic_number) {
            say "Guess something bigger";
            redo GUESSAGAIN;
        }
        else{
            say "You are right!";
        }
}

没看懂题目啥意思，自己写了一个娱乐版。

#! /usr/bin/perl
use v5.10;

foreach my $key (sort keys %ENV){
       say "$key is not defined" 
        unless $ENV{$key};
}

Ch11 Perl模块

Perl模块发布网站：https://metacpan.org/

使用perldoc命令可以查看Perl 模块的说明。

下载

使用cpan（Comprehensive Perl Archive Network）或cpanm命令。

模块的使用

File::Basename 使用

#!/usr/bin/perl

use v5.24;

use File::Basename;
my $name = "/usr/local/bin/perl";
my $basename = basename $name;  # gives 'perl'

say $basename;

在代码开头使用use命令，可以完成模块的挂载。

使用部分函数

在导入时可以自定义需要导入的函数，避免命名冲突。
use File::Basename qw/ basename /;

即使不导入任何函数，也可以用全名的方式调用函数：

1 2	my $name = "/usr/local/bin/perl"; my $dirname = File::Basename::dirname $name; # dirname from the module

spec模块

spec模块使用面向对象：

#!/usr/bin/perl

use v5.24;

use File::Basename;
use File::Spec;

print "Please enter a filename: ";
chomp(my $old_name = <STDIN>);
my $dirname  = dirname $old_name;
my $basename = basename $old_name;
$basename =~ s/^/not/; 
my $new_name = File::Spec->catfile($dirname, $basename);
rename($old_name, $new_name)
    or warn "Can't rename '$old_name' to '$new_name': $!";

和C++类似，通过->调用对象中的函数。

DBI （Database Interface）模块

略

时间和日期的模块

使用Time::Moment模块：

#!/usr/bin/perl

use v5.24;
use  Time::Moment;


my $dt = Time::Moment->now;
printf "%4d%02d%02d \n", $dt->year, $dt->month, $dt->day_of_month;

习题

#!/usr/bin/perl

use v5.24;
use  Module::CoreList;

my %modules = %{ $Module::CoreList::version{5.024} };

# print %modules;
foreach my $key (keys %modules){
    say "$key => $modules{$key}";
}

#!/usr/bin/perl

use v5.24;
use  Time::Moment;

my $dt = Time::Moment->now;

my $input_dt = Time::Moment->new(
    year => @ARGV[0],
    month => @ARGV[1],
    day        => @ARGV[2],
);

my $years  = $input_dt->delta_years( $dt );
my $months  = $input_dt->delta_months( $dt ) % 12;
my $days  = $input_dt->delta_days( $dt )  % 30;
printf "%d years,%d months and  %d days \n",$years,$months,$days;

Ch12 文件测试

文件测试操作符

使用perldoc -f -X可查看文档:

例如，查看文件是否存在,使用-e操作。

#!/usr/bin/perl
use v5.24;

my $filename="test.txt";
die "Oops! A file called '$filename' already exists.\n"
  if -e $filename;

统计大于100K且90天未使用的文件：

my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /;
my @big_old_files;  # The ones we want to put on backup tapes
foreach my $filename (@original_files) {
 push @big_old_files, $filename
   if -s $filename > 100_000 and -A $filename > 90;
}

一些常用的文件测试操作：

大部分测试操作返回一个布尔值，-s是个例外，会返回文件大小。在希望对返回的大小进行操作，比如更换统计单位时，需要注意。

1 2	# The filename is in $_ my $size_in_K = -s / 1000; # Oops!

上述写法会报错，perl不会识别除法，而是将其当作-s的选项导致报错。上述操作应该写成：

1	my $size_in_k = (-s) / 1024; # Uses $_ by default

测试同一文件的多个属性

1 2	if (-r $filename and -w $filename) { ... }

上述写法可以用虚文件句柄来进行简写：

1 2	if (-r $filename and -w _) { ... }

栈式文件测试操作符

perl5.10之后的版本可以支持一次性列出所有的文件操作。

use v5.10;
 if (-r -w -x -o -d $filename) {
  print "My directory is readable, writable, and executable!\n";
 }

stat和lstat函数

perl的stat函数和unix的系统调用stat类似，会返回丰富的信息。

一些重要的文件信息

$dev和$ino

文件设备编号和文件inode编号，保证了文件的唯一性。

$mode

文件的权限位集合。

#!/usr/bin/perl
use v5.24;

 my($dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
  $size, $atime, $mtime, $ctime, $blksize, $blocks)
    = stat("test.txt");

say $mode;

上述代码返回33188,对应八进制数100644。去掉最高位的 1，表示文件类型为普通文件，剩下的 0644 就是文件的权限。
即rw-r–r–.

$nlink

文件或目录的硬链接数

$uid和$gid

以数值形式表示的文件拥有者的用户ID及用户组ID。

$size

以字节为单位的文件大小，和-s操作符的返回值相同。

$atime,$mtime,$time

3种时间戳

对链接符号调用stat会返回其指向的对象的信息，而非链接符号本身的信息，如果想获得连接符号本身的信息，可以用lstat代替，stat，但这多半没什么用。

ps:链接符号是一个指向另一个文件或目录的指针。符号链接本身（Symbolic Link Itself）：是指这个指针文件，它有自己的文件属性，如大小、权限、所有者等。

例如：
ln -s original.txt link.txt
创建了一个链接符号，link是连接符号本身，original是原文件。

localtime函数

#!/usr/bin/perl
use v5.24;

my $timestamp = 1454133253;
my $date = localtime $timestamp;

say "date: $date";

用于将时间戳转化为人类更容易理解的方式。

按位运算操作符

与verilog和C中的一样：

按位操作字符串

由于perl的上下文特性，对字符串的按位操作会返回字符串，对整数的按位操作会返回整数。

#!/usr/bin/perl
use v5.24;


my $s =  "\xAA" | "\x55";

say ord($s);

课后习题

#!/usr/bin/perl
use v5.24;

foreach (@ARGV){
    if(-e -r -w -x ){
        say "$_ exists and is readable,writable and executable !";
    }
    else {
        say "$_ not exist or  is not readable,writable and executable! !"
    }
}

#!/usr/bin/perl
use v5.24;

my $filename="";
my $age=0;
foreach (@ARGV){
    if(-A > $age ){
        $age = -A $_ ;
        $filename = $_;
    }
}

say "The oldest file is $filename and it has existed for $age days";

#!/usr/bin/perl
use v5.24;


foreach (@ARGV){
   if (-r -w -O){
        say "You own file $_ and its readable and writable";
   }
}

Ch13 目录操作

使用chdir操作来改变当前的工作目录,类似shell中的cd命令。

#!/usr/bin/perl
use v5.24;
use Cwd;


say "The current working directory is ", getcwd();
chdir '/etc' or die "cannot chdir to /etc: $!";
say "The current working directory is ", getcwd();

使用Cwd模块，可以获取当前工作目录，相当于pwd命令。

需要注意的是，在linux中我们常用cd ~命令来回到主目录，这在perl中是行不通的。可以使用如下方法先获得当前用户名，然后移动到用户名的主目录中：

1
2
3

my $username = getpwuid($<);
$dir = "/home/$username";
chdir $dir;

通配globbing

#!/usr/bin/perl
use v5.24;
use Cwd;


chdir '../ch7/';
say "The current working directory is ", getcwd();
my @files = glob "*.pl";

foreach (@files){
    say "I have file: $_";
}

例如，使用如下程序，用于统计所有回归测试的结果：

#!/usr/bin/perl

use v5.14;
chdir './run';
my @test_list = glob "rv*/*.report";
my $sum=0;
my $pass=0;

my @pass,my @fail;
foreach my $file(@test_list){
    $sum++;
    open FILE,$file 
        or  die "Could not open log:$file: $!";
    my $lines = join '', <FILE>;
    close FILE;
    if ($lines =~ /PASS/){
        $pass++;
        push @pass,$file =~ s|/.*||r;
    }
    else {
        push @fail,$file =~ s|/.*||r;
    }
}
printf "TEST PASS: $pass/$sum \nPASS RATE:%6.2f %\n",$pass/$sum*100 unless $sum eq 0;
say '-' x 35 . "REGRESS  TEST  REPORT" . '-' x 35;
say '-' x 35 . "       TEST  PASS    " . '-' x 35;
foreach (@pass){
    printf "|               PASS: %45s                       |\n",$_;
}
say "-"x92;
say '-' x 35 . "       TEST  FAIL    " . '-' x 35;
foreach (@fail){
    printf "|               FAIL: %45s                       |\n",$_;
}
say "-"x92;

效果如下：

通配的另一种语法

采用glob的语法和尖括号的效果是一样的：

1	my @all_files = <>; # exactly the same as my @all_files = glob "";

由于尖括号同时代表文件名读取，因此perl内部有不同机制来辨别这两种操作：

目录句柄

使用 opendir和readdir可以进行目录操作：

my $dir_to_process = '/etc';
opendir my $dh, $dir_to_process or die "Cannot open $dir_to_process: $!";
foreach $file (readdir $dh) {
 print "one file in $dir_to_process is $file\n";
}
closedir $dh;

如果只想处理以.pm结尾的文件，可以写成：

while ($name = readdir $dh) {
 next unless $name =~ /\.pm\z/;
 ... more processing ...
}

需要注意的是，readdir返回的是目录下的文件名列表，不含文件的完整路径。需要完整路径时需要手动补充完整：

opendir my $somedir, $dirname or die "Cannot open $dirname: $!";
while (my $name = readdir $somedir) {
 next if $name =~ /\A\./;           # skip over dot files
 $name = "$dirname/$name";          # patch up the path
 next unless -f $name and -r $name; # only readable files
 ...
}

此外还可以使用File::Spec::Functions模块以适应任何操作系统。

use File::Spec::Functions;
opendir my $somedir, $dirname or die "Cannot open $dirname: $!";
while (my $name = readdir $somedir) {
 next if $name =~ /\A\./;             # skip over dot files
 $name = catfile( $dirname, $name ); # patch up the path
 next unless -f $name and -r $name;  # only readable files
 ...
}

删除文件

使用unlink函数可以在perl内删除程序：

1 2	unlink 'slate', 'bedrock', 'lava'; unlink qw(slate bedrock lava);

由于unlink可以接受列表作为参数，因此可以和glob配合，一次删除多个文件：

1	unlink glob '*.o';

unlink的返回值是成功删除文件的个数，可以用来检验是否删除成功。

对于多个文件，还是采用循环的办法比较可靠，可以知道是否删除成功：

1
2
3

foreach my $file (qw(slate bedrock lava)) {
 unlink $file or warn "failed on $file: $!\n";
}

重命名文件

使用rename函数可以重命名文件，加上完整的目录可以实现移动的效果(类比mv 命令)。使用胖箭头可以更清楚的表示改名前后的名称。

1
2
3

rename 'old', 'new'；
rename 'over_there/some/place/some_file', 'some_file';
rename 'over_there/some/place/some_file' => 'some_file';

rename失败会返回false,并将文件名报错在$!变量中。

一个场景的使用场景是将所有old结尾的文件改为new结尾，程序如下：

foreach my $file (glob "*.old") {
 my $newfile = $file;
 $newfile =~ s/\.old$/.new/;
 if (-e $newfile) {
   warn "can't rename $file to $newfile: $newfile exists\n";
 } elsif (rename $file => $newfile) {
   # success, do nothing
 } else {
   warn "rename $file to $newfile failed: $!\n";
 }
}

链接与文件

在 Unix 和类似 Unix 的操作系统中，硬链接（Hard Link）和软链接（Soft Link，也称为符号链接，Symbolic Link）是两种用于创建文件系统链接的方法。它们之间有一些重要的区别和相似之处：

硬链接（Hard Link）
定义：
硬链接是指向文件数据块的直接引用。它是文件系统的多重引用，即同一个数据块可以有多个路径名。
硬链接创建了一个文件名和数据块之间的多重引用。

特性：

共享相同的 inode：硬链接和源文件共享相同的 inode 编号，因此它们是完全等价的，任何一个被修改都会影响到另一个。
删除效果：删除一个硬链接不会删除文件数据，只有当所有指向文件数据块的硬链接都被删除时，文件数据才会被删除。
不跨文件系统：硬链接只能在同一个文件系统内创建，不能跨越不同的文件系统。
目录限制：不能为目录创建硬链接（除非具有超级用户权限且特定情况下）。

ln source_file hard_link

软链接（Soft Link 或 Symbolic Link）
定义：

软链接是一个指向另一个文件路径的特殊文件。它包含了目标文件的路径名，而不是直接指向文件数据块。
软链接是一种间接引用。
特性：

不同的 inode：软链接有自己独立的 inode 编号，与目标文件不同。
删除效果：删除软链接不会删除目标文件数据，但如果目标文件被删除，软链接将变为无效的悬挂链接（Dangling Link）。
可以跨文件系统：软链接可以指向不同文件系统上的文件或目录。
目录链接：可以为目录创建软链接。

ln -s target_file symbolic_link

创建和删除目录

调用mkdir就可以直接创建目录：

1	mkdir 'fred', 0755 or warn "Cannot make fred directory: $!";

这里0755是八进制的目录权限。

使用rmdir可以删除目录：

1
2
3

foreach my $dir (qw(fred barney betty)) {
  rmdir $dir or warn "cannot rmdir $dir: $!\n";
}

与unlink不同的是rmdir每次只能删除一个目录，且不能删除非空目录，需要先使用unlink将其中的文件删除。

修改权限

类似Unix，使用chmod可以修改文件权限。但是不可以接受+x这样的参数：

1	chmod 0755, 'fred', 'barney';

修改隶属关系

1
2
3

my $user  = 1004;
my $group = 100;
chown $user, $group, glob '*.o';

使用getpwname和getgpname可以获得对应的用户名和用户组名的编号：

1
2
3

defined(my $user = getpwnam 'merlyn') or die 'bad user';
defined(my $group = getgrnam 'users') or die 'bad group';
chown $user, $group, glob '/home/merlyn/*';

修改时间戳

使用utime可以修改时间戳：

1
2
3

my $now = time;
my $ago = $now - 24 * 60 * 60;  # seconds per day
utime $now, $ago, glob '*';     # set access to now, mod to a day ago

第一个参数是访问时间，第二个参数是修改时间。

习题

#!/usr/bin/perl
use v5.24;
use Cwd;
say "Please give me a directory:";
chomp(my $dir = <>);
my $username = getpwuid($<);
$dir = "/home/$username" unless $dir;
chdir $dir;
say "The current working directory is ", getcwd();

my @files = glob "*";
foreach (sort @files){
    say "You have file/dirctory: $_";
}

#!/usr/bin/perl
use v5.24;
use Cwd;
say "Please give me a directory:";
chomp(my $dir = <>);
my $username = getpwuid($<);
$dir = "/home/$username" unless $dir;
chdir $dir;
say "The current working directory is ", getcwd();

my @files = glob ".* *";
foreach (sort @files){
    say "You have file/dirctory: $_";
}

#!/usr/bin/perl
use v5.24;
use Cwd;
say "Please give me a directory:";
chomp(my $dir = <>);
my $username = getpwuid($<);
$dir = "/home/$username" unless $dir;
chdir $dir;
say "The current working directory is ", getcwd();

opendir my $dh, $dir or die "Cannot open $dir: $!";
foreach (sort readdir $dh){
    say "You have file/dirctory: $_";
}
closedir $dh;

#!/usr/bin/perl
use v5.24;
use Cwd;

sub rm_dir(){
    my ($dir) = @_;
    return -1 unless -d $dir;
    my $current_dir = getcwd();
    chdir $dir or die "Cannot change directory to $dir: $!";
    foreach my $file (glob "*"){
            if (-f $file){
                unlink $file or warn "failed on $file: $!\n";
            }
            if (-d $file){
                rm_dir($file);
            }
    }
    chdir $current_dir or die "Cannot change directory to $current_dir: $!";
    rmdir $dir or warn "Failed to remove directory $dir: $!";
}

foreach (@ARGV){
    next unless -e;
    if(-d){
        &rm_dir($_);
    }
    if(-f){
        unlink $_ or warn "failed on $_: $!\n";
    }
}

由于rmdir无法直接删除目录，使用递归进行嵌套操作。
假设有如下文件关系：

删除效果如下：

#!/usr/bin/perl
use v5.24;
use Cwd;

my $cwd = getcwd();
my $old_name=@ARGV[0], my $new_name=@ARGV[1];

if(-e $new_name){
    warn "can't rename $old_name to $new_name: $new_name exists\n";
}
elsif (-d $old_name){
    if (rename "$cwd/$old_name" => "$cwd/$new_name"){}
    else{
        warn "rename $old_name to $new_name failed: $!\n";
    }
}
else {
    if (rename $old_name => $new_name){}
    else{
        warn "rename $old_name to $new_name failed: $!\n";
    }
}

Ch14 字符串与排序

用index查找子字符串

1	my $where = index($big, $small);

查找small在big中第一次出现的位置。

1 2	my $stuff = "Howdy world!"; my $where = index($stuff, "wor");

还可以加上第三个参数，代表从指定的位置开始寻找：

my $stuff  = "Howdy world!";
my $where1 = index($stuff, "w");               # $where1 gets 2
my $where2 = index($stuff, "w", $where1 + 1);  # $where2 gets 6
my $where3 = index($stuff, "w", $where2 + 1);  # $where3 gets -1 (not found)

下面是一个完整的示例：

#!/usr/bin/perl

 use v5.10;
 my $stuff  = "Howdy world!";
 my @where = ();
 my $where = -1;
 while( 1 ) {
  $where = index( $stuff, 'w', $where + 1 );
  last if $where == -1;
  push @where, $where;
 }
 say "Positions are @where";

如果需要从后往前查找，则使用rindex:

1	my $last_slash = rindex("/etc/passwd", "/"); # value is 4

use v5.10;
 my $fred = "Yabba dabba doo!";
 my @where = ();
 my $where = length $fred;
 while(  ) {
  $where = rindex($fred, "abba", $where - 1 )
  last if $where == -1;
  push @where, $where;
 }
 say "Positions are @where";

用substr操作子字符串

1	my $part = substr($string, $initial_position, $length);

substr的第一个参数是要操作的字符串，第二个是起始位置，第三个是长度。

如果想要一直取到末尾，则第三个参数可以省略。

substr可以与index配合使用：

1 2	my $long = "some very very long string"; my $right = substr($long, index($long, "l") );# "long string"

当字符串被存储在变量中时，可以用substr来替换变量的内容：

#!/usr/bin/perl
use v5.10;

my $string = "Hello, world!";
substr($string, 0, 5) = "Goodbye";  # "Goodbye, world!
say $string;

如果第三个长度参数设置成0，则相当于插入操作：

1	substr($string, 9, 0) = "cruel"; # "Goodbye, cruel world!";

以及可以用正则表达式进行详细的编辑：

1	substr($string, -20) =~ s/fred/barney/g;

上述操作会将字符串末尾的20个字符中的fred替换为barney.

用sprintf格式化字符串

sprintf和printf拥有相同的参数，但返回的是请求的字符串，而不会将其打印出来。
例如：

1
2
3

my $date_tag = sprintf
 "%4d/%02d/%02d %2d:%02d:%02d",
 $yr, $mo, $da, $h, $m, $s;

按照格式编辑好字符串，从而方便下一步操作。

用sprintf格式化金额数字

例如，想让一个金额数字保留2位有效数字，并且大额数字每3位加上一个逗号，可以使用如下程序：

sub big_money {
 my $number = sprintf "%.2f", shift @_;
 # Add one comma each time through the do-nothing loop
 1 while $number =~ s/^(-?\d+)(\d\d\d)/$1,$2/;
 # Put the dollar sign in the right place
 $number =~ s/^(-?)/$1\$/;
 $number;
}

上述第四行的while循环是一个倒装，效果是每次从后向前匹配3个数字，在前方插入逗号，匹配成功后进入循环，但是循环主体是1，也就是什么也不做。接着进行下一次替换，直到替换失败返回false。

高级排序

perl运行按照自定义规则进行排序(有点像运算符的重载),只需要自定义一个排序子函数，然后将其放在sort和列表之间：

sub by_number {
 # a sort subroutine, expect $a and $b
 if ($a < $b) { -1 } elsif ($a > $b) { 1 } else { 0 }
}

 my @result = sort by_number @some_numbers;

还有更简单的定义方式，称为飞船符号：

1	sub by_number { $a <=> $b }

可以利用字符串比较操作cmp来构建大小写不敏感的比较：

sub by_code_point { $a cmp $b }
my @strings = sort by_code_point @any_strings;

sub case_insensitive { "\L$a" cmp "\L$b" }

还可以进一步简化，直接在sort之后定义排序规则：

1	my @descending = sort { $b <=> $a } @some_numbers;

按哈希值排序

例如，按照得分对球员进行排序：

#!/usr/bin/perl
use v5.24;

my %score = ("barney" => 195, "fred" => 205, "dino" => 30);
my @winners = sort by_score keys %score;
sub by_score { $score{$b} <=> $score{$a} };

foreach (@winners){
    say "$_ get $score{$_} scores";
}

注意这里定义的子函数by_score,它通过比较2个键对应的值，对原本的键进行降序排序。

按照多个键进行排序

如果有多个同分球员，则按照字符串规则进行排序：

#!/usr/bin/perl
use v5.24;

 my %score = (
  "barney" => 195, "fred" => 205,
  "dino" => 30, "bamm-bamm" => 195,
 );

my @winners = sort by_score_and_name keys %score;
 sub by_score_and_name {
  $score{$b} <=> $score{$a}  # by descending numeric score
    or
  $a cmp $b          # code point order by name
  } 

foreach (@winners){
    say "$_ get $score{$_} scores";
}

实际应用中，可以定义多重比较规则：

@patron_IDs = sort {
  &fines($b) <=> &fines($a) or
  $items{$b} <=> $items{$a} or
  $family_name{$a} cmp $family_name{$b} or
  $personal_name{$a} cmp $family_name{$b} or
  $a <=> $b
 } @patron_IDs;

习题

#!/usr/bin/perl
use v5.24;

my @sorted = sort {$b <=> $a} @ARGV;

my $result;
foreach (@sorted){
    $result = $result . sprintf"%12s\n",$_;
}
print $result;

#!/usr/bin/perl
use v5.24;

 my %last_name = qw{
  fred flintstone Wilma Flintstone Barney Rubble
  betty rubble Bamm-Bamm Rubble PEBBLES FLINTSTONE
 };

foreach my $key (keys %last_name){
    say "$key - $last_name{$key}";
}

sub sort_by_me {
    "\L$last_name{$a}" cmp "\L$last_name{$b}" or 
    "\L$a" cmp "\L$b"
}
say "-----after sort------";
my @sorted = sort sort_by_me keys %last_name;

foreach (@sorted){
    say "$_ - $last_name{$_}";
}

#!/usr/bin/perl
use v5.10;

my $staff = @ARGV[0];
my $sub = @ARGV[1];

@where=();
$where = -1;
while (1){
    $where = index($staff,$sub,$where+1);
    last if $where== -1;
    push @where,$where;
}
say "position is @where";

Ch15 进程管理

system函数

在perl程序中可以直接使用system调用shell中函数：

#!/usr/bin/perl
use v5.24;

system 'date'; 
system 'ls -l $HOME';
system "ls -l \$HOME";

使用双引号时，需要对特殊字符进行转义。

system 实际上创建了一个子进程，从当前的perl程序切换到该子进程，执行结束后返回。

Learn Perl

Ch2 标量数据

第一个Helloword 程序

标量数据

其他特性

Ch3 列表和数组

数组array

array

push,pop,shift,unshift

splice

字符串数组内插

foreach

默认变量 $_

reverse,sort,each

列表上下文与标量上下文

列表上下文中使用标量表达式&强制标量上下文

列表上下文中的<STDIN>

习题

Ch4 子程序

变量传递

子程序中的私有变量

use strict 编译指令

返回 return

省略&符号

非标量输出

持续变量

子函数签名subroutine signatures

习题

Ch5 输入/输出

标准输入流

钻石符号<>

双钻石符号 <<>>

调用参数列表

标准输出

括号

格式化输出printf

列表与printf

文件句柄 filehandles

二进制格式binmode

结束句柄

die用于处理致命错误

文件句柄的使用

重新打开文件句柄

标量变量的句柄

习题

Ch6 哈希

概念

箭头符号 =>

哈希函数

keys 和 values

each函数

exist, delete和内插

环境变量

习题

Ch7 漫游正则表达式

正则表达式的概念

通配

量词 *,?, +，{m,n}

模式分组 ()与反向引用g{}

选择符 |

字符集[]

字符集的简写

Unicode 特性

锚点Anchor

词锚 word anchor

习题

Ch8 正则表达式进行匹配

/i进行大小写无关的匹配

/s进行任意字符的匹配

/x加入空白符

修饰符号的组合

选择一种字符的解释方式

\m进行行匹配

其他操作

绑定操作符 =~

捕获变量

捕获变量的存续期

不捕获圆括号 (?:)

命名捕获

自动捕获变量

量词 `*`,`?`, `+`，`{m,n}`

模式分组 `()`与反向引用`g{}`

选择符 `|`

字符集`[]`

`/i`进行大小写无关的匹配

`/s`进行任意字符的匹配

`/x`加入空白符

`\m`进行行匹配

绑定操作符 `=~`

不捕获圆括号 `(?:)`

用`s///`进行替换

使用`g`进行全局替换

定义或操作符 `\\`