Programming, Français, Programmation

[Débutants] Entrée et sortie de données en Perl: STDOUT, STDERR et STDIN

Un programme informatique ne fonctionne pas dans le vide; d’une façon ou d’une autre, les données doivent pouvoir entrer et sortir.
Lorsqu’on programme un script en ligne de commande en Perl, on peut utiliser les entrées et sorties standards mises à notre disposition par le système d’exploitation.

Sorties standards

La sortie standard s’appelle tout simplement STDOUT. Quand on utilise print($unTruc), Perl recrache $unTruc dans la sortie standard du système hôte.

print STDOUT "foo\n";

est équivalent à:

print "foo\n";

Sur un système Unix, on pourra transférer les données affichée par notre programme vers un autre programme ou fichier, en utilisant des redirections shell:

perl > text.txt

On recrache le résultat de notre programme dans le fichier text.txt. Pour afficher les messages d’erreur, on peut utiliser STDERR de la même manière.

Entrées standards

On peut aussi utiliser STDIN, l’entrée standard du système:

print "Votre nom: ";
my $name = <STDIN>;

chomp $name;
print "Vous vous appelez '$name'\n";

L’opérateur <> qui entoure STDIN permet de lire l’entrée standard du système (généralement une commande utilisateur) ligne par ligne (voir la documentation des opérateur d’entrée-sortie). L’utilisation de chomp() à la fin permet d’éliminer le charactère “Entrée” que l’utilisateur devra utiliser pour valider sa commande.

Avertissement: attention aux données exposées

Attention: les entrées/sorties sont utiles aux programmeurs, mais soyons prudent lorsque nous les utilisons pour exposer le fonctionnement interne de notre code! Cela pourrait avoir des conséquences désastreuses si on expose des données sensibles.
Par exemple, voilà ce qui se passe si on laisse malencontreusement un programme enregistrer toutes les touches entrées par l’utilisateur dans un fichier non crypté (!!). Des hackers pourraient utiliser ce fichier pour trouver les mots de passe entrés par un utilisateur.

Programming, English

[Beginners] Ins and outs: STDOUT, STDERR et STDIN in Perl

A computer program doesn’t exist in a vacuum; one way or another, data needs to go in and out.

When writing a command line script in Perl, we can use the standard input and outputs from our operating system.

Standard outputs

The standard input is simply called STDOUT. When we use print($something), Perl spits out $something in the system’s standard outputs:

print STDOUT "foo\n";

is equivalent to:

print "foo\n";

On a Unix system, you can transfer data displayed by your program to another program or to a file, using shell redirections:

perl > text.txt

We spit out whatever outputs into  text.txt.
To display error messages, you can use STDERR the same way

Standards inputs

We can also use STDIN, out system’s standard input:

print "Your name: ";
my $name = <STDIN>;

chomp $name;
print "Your name is '$name'\n";

The <> operator around STDIN enables us to read the standard input (usually a user command) line by line (see Perl’s documentation for I/O Operators). Using chomp() at the end deletes the last character, the invisible “Enter” character that the user will use to validate the command.

Caution: don’t expose sensitive data

Standard I/O’s are really useful to developers, but we need to be careful when exposing the internals of our code through them, or consequences could be disastrous!

For instance, here’s what happens when someone inadvertently lets a program log all keystrokes in an unencrypted file(!!). Hackers could use that file to find out passwords entered by a user.


Pattern matching with regular expressions: a quick reminder

Pattern matching is a powerful for developers, but it can be a daunting task to write (or read) a regular expression.
This quick reminder gives you the bare necessities to tackle regular expressions, if you’ve dabbled with regexp a bit but can’t remember it all.

Regular expressions in Perl

If you take “text” in the widest possible sense, perhaps 90% of what you do is 90% text processing. That’s really what Perl is all about and always has been about—in fact, it’s even part of Perl’s name: Practical Extraction and Report Language.
Larry Wall et al., Programming Perl

There you have it: regular expressions and pattern matching are at the core of Perl, which is why I started this post with it. It’s an integral part of the language, so much so that there is no function called match or subst in Perl, just a simple expression that you can put in a condition:

$_ = "Is there a doctor here?";
if (/doctor/) {
     print "Doctor who?";

Regular expressions in PHP and JavaScript

In PHP, you’ll have to use a dedicated function called preg_match() to get a match from a regular expression.
preg_match() needs at least two arguments: a regular expression, and a string to match it against:

$subject = "Is there a doctor here?";
$pattern = '/doctor/';
if (preg_match($pattern, $subject)) {
   echo "Yes. Yes, there is.";
(source: PHP documentation: preg_match)

JavaScript has its own RegExp  object, and a handy function that matches a string agains a RegExp object:

var subject = "Is there a doctor here?";
var pattern = new RegExp('doctor');

String.prototype.match() creates a new RegExp object on the fly if a string is provided instead:

var subject = "Is there a doctor here?";
var pattern = /doctor/;

(Source: Mozilla Developer Network)

Meta-characters in regular expressions

Beyond the most elementary expressions, regexps can sometimes look like gibberish, unless you know what all these characters stand for. The next few paragraph are a reminder of the most useful meta-characters you can find in a regular expression:


Beginning of input
End of input
Roughly speaking, beginning of a word (zero-width word boundary)
Roughly speaking, end of a word (zero-width non-word boundary)


x*? (non-greedy)
Matches x 0 or more times
x+? (non-greedy)
Matches x 1 or more times
Matches x 0 or 1 time
Matches x n times or n+ times
Matches x at least n times
x {n,m}
matches x between n and m times

Grouping & alternation

( )
delimits a group of patterns, for instance (n*)(m*) matches n 0 or more times, then m 0 or more times
Matches either alternative patterns foo|bar matches either “foo” or “bar”

Character sets and character classes

Matches anything that is within xyz. Alphanumerical characters can be matches with a-z and A-Z, and digits can be matches with 0-9
opposite is [^xyz]


Dot: matches anything but line terminations
Matches a digit
opposite is D
Matches an alphanumeric character
opposite is W
Matches white space characters (tabs, spaces, etc)
Same as  [ fnrtv?u00a0u1680?u180eu2000?-u200a?u2028u2029u202fu205f?u3000ufeff]
opposite is S
Matches backspace
Not the same as b