Programming, Français, Programmation

[Débutants] Entrée et sortie de données en Perl: STDOUT, STDERR et STDIN

Un programme informatique ne fonctionne pas dans le vide; d’une façon ou d’une autre, les données doivent pouvoir entrer et sortir.
Lorsqu’on programme un script en ligne de commande en Perl, on peut utiliser les entrées et sorties standards mises à notre disposition par le système d’exploitation.

Sorties standards

La sortie standard s’appelle tout simplement STDOUT. Quand on utilise print($unTruc), Perl recrache $unTruc dans la sortie standard du système hôte.

print STDOUT "foo\n";

est équivalent à:

print "foo\n";

Sur un système Unix, on pourra transférer les données affichée par notre programme vers un autre programme ou fichier, en utilisant des redirections shell:

perl stdout.pl > text.txt

On recrache le résultat de notre programme stdout.pl dans le fichier text.txt. Pour afficher les messages d’erreur, on peut utiliser STDERR de la même manière.

Entrées standards

On peut aussi utiliser STDIN, l’entrée standard du système:

print "Votre nom: ";
my $name = <STDIN>;

chomp $name;
print "Vous vous appelez '$name'\n";

L’opérateur <> qui entoure STDIN permet de lire l’entrée standard du système (généralement une commande utilisateur) ligne par ligne (voir la documentation des opérateur d’entrée-sortie). L’utilisation de chomp() à la fin permet d’éliminer le charactère “Entrée” que l’utilisateur devra utiliser pour valider sa commande.

Avertissement: attention aux données exposées

Attention: les entrées/sorties sont utiles aux programmeurs, mais soyons prudent lorsque nous les utilisons pour exposer le fonctionnement interne de notre code! Cela pourrait avoir des conséquences désastreuses si on expose des données sensibles.
Par exemple, voilà ce qui se passe si on laisse malencontreusement un programme enregistrer toutes les touches entrées par l’utilisateur dans un fichier non crypté (!!). Des hackers pourraient utiliser ce fichier pour trouver les mots de passe entrés par un utilisateur.

Standard
Programming, English

[Beginners] Ins and outs: STDOUT, STDERR et STDIN in Perl

A computer program doesn’t exist in a vacuum; one way or another, data needs to go in and out.

When writing a command line script in Perl, we can use the standard input and outputs from our operating system.

Standard outputs

The standard input is simply called STDOUT. When we use print($something), Perl spits out $something in the system’s standard outputs:

print STDOUT "foo\n";

is equivalent to:

print "foo\n";

On a Unix system, you can transfer data displayed by your program to another program or to a file, using shell redirections:

perl stdout.pl > text.txt

We spit out whatever stdout.pl outputs into  text.txt.
To display error messages, you can use STDERR the same way

Standards inputs

We can also use STDIN, out system’s standard input:

print "Your name: ";
my $name = <STDIN>;

chomp $name;
print "Your name is '$name'\n";

The <> operator around STDIN enables us to read the standard input (usually a user command) line by line (see Perl’s documentation for I/O Operators). Using chomp() at the end deletes the last character, the invisible “Enter” character that the user will use to validate the command.

Caution: don’t expose sensitive data

Standard I/O’s are really useful to developers, but we need to be careful when exposing the internals of our code through them, or consequences could be disastrous!

For instance, here’s what happens when someone inadvertently lets a program log all keystrokes in an unencrypted file(!!). Hackers could use that file to find out passwords entered by a user.

Standard
Programming

There is no problem in computer science that can’t be solved using another level of indirection.
David Wheeler

From Stack Overflow:

——

“Indirection” is using something that uses something else, in its broadest sense.

So your example, using a pointer of a value instead of the value, fits this definition at one level. The pointer is the something and the value is the something else.

Typically this is something larger in scope:

Using a web site to graphically display the data generated by an XML based service. Here the web site is the something and hiding behind it is the data which is the something else.
Using an operating system to access the display screen. Here are two layers, at least of indirection. The OS uses the screen driver. One something using a something else. Then the screen driver talks directly to the screen hardware causing it to make tiny dots of light here and there. The driver is the next something using the something else which is the hardware.
It is not uncommon for one API to deal with something on a high level and that API deals with the same thing on a lower level. Again a level of indirection is added on top of the low level API and we call it the new, improved API.

This last example, perhaps, explains the “why” of it all.

As we work with something we master it and learn how to abstract it to a higher level of abstraction, thus a new level of indirection is needed and we can solve bigger problems faster by offloading some of the work to the new API.

——

Link
Programming

Vim Settings: seeing tabs and setting them to 4 characters

A simple set of Vim commands to display tabs when I code, and to make sure that all tabs are 4 characters long:

:set list 
# This command displays tab characters
:set listchars=tab:>.
# This defines how the tab character should be displayed 
# We use a more than sign ('>') for the first character, 
# then dots for the remaining ones
:set tabstop=4
# Makes the tab indentation 4 characters long
Standard
Programming

Regexp: how to validate a UK Postcode

In the UK, postcodes are a crucial part of someone’s address, as they narrow addresses down to the street of the desired location (in countries like France, however, a postcode only tells you which city or town the address is located).
It can be handy to make sure your users provide a valid UK postcode in a form. To do this, we’ll need to match the postcode to a regexp… but what is a valid UK postcode, and what regexp can we use to make sure a postcode is indeed valid ?

Wikipedia tells us UK postcodes can use the following formats (‘A’ stands for any letter, and ‘9’ for any digit):

Format Coverage Example
AA9A 9AA WC postcode area; EC1–EC4, NW1W, SE1P, SW1 EC1A 1BB
A9A 9AA E1W, N1C, N1P W1A 0AX
A9 9AA B, E, G, L, M, N, S, W M1 1AE
A99 9AA B33 8TH
AA9 9AA All other postcodes CR2 6XH
AA99 9AA DN55 1PT

To match any of these patterns, we can use the following regular expression:

/^[a-zA-Z]{1,2}([0-9]{1,2}|[0-9][a-zA-Z])\s*[0-9][a-zA-Z]{2}$/

Let’s decompose this so we know what it’s doing:

  • We start by making sure our postcode isn’t surrounded by anything else in our field, so we encase it between a beginning of input boundary (^) and an end of input boundary ($).
  • Then, we accept one or two letters [a-zA-Z]{1,2}
  • Followed by either one digit and a letter, or between one and two digits: ([0-9]{1,2}|[0-9][a-zA-Z])
  • An optional space \s*
  • Finally, a digit followed by two letters: [0-9][a-zA-Z]{2}

What have I missed? Is there a better way to write this? How hard is it to get UK addresses from a postcode using an external API? Let me know !

Standard
Programming

Pattern matching with regular expressions: a quick reminder

Pattern matching is a powerful for developers, but it can be a daunting task to write (or read) a regular expression.
This quick reminder gives you the bare necessities to tackle regular expressions, if you’ve dabbled with regexp a bit but can’t remember it all.

Regular expressions in Perl

If you take “text” in the widest possible sense, perhaps 90% of what you do is 90% text processing. That’s really what Perl is all about and always has been about—in fact, it’s even part of Perl’s name: Practical Extraction and Report Language.
Larry Wall et al., Programming Perl

There you have it: regular expressions and pattern matching are at the core of Perl, which is why I started this post with it. It’s an integral part of the language, so much so that there is no function called match or subst in Perl, just a simple expression that you can put in a condition:

$_ = "Is there a doctor here?";
if (/doctor/) {
     print "Doctor who?";
}

Regular expressions in PHP and JavaScript

In PHP, you’ll have to use a dedicated function called preg_match() to get a match from a regular expression.
preg_match() needs at least two arguments: a regular expression, and a string to match it against:

$subject = "Is there a doctor here?";
$pattern = '/doctor/';
if (preg_match($pattern, $subject)) {
   echo "Yes. Yes, there is.";
}
(source: PHP documentation: preg_match)

JavaScript has its own RegExp  object, and a handy function that matches a string agains a RegExp object:

var subject = "Is there a doctor here?";
var pattern = new RegExp('doctor');
console.log(subject.match(re);

String.prototype.match() creates a new RegExp object on the fly if a string is provided instead:

var subject = "Is there a doctor here?";
var pattern = /doctor/;
console.log(subject.match(re);

(Source: Mozilla Developer Network)

Meta-characters in regular expressions

Beyond the most elementary expressions, regexps can sometimes look like gibberish, unless you know what all these characters stand for. The next few paragraph are a reminder of the most useful meta-characters you can find in a regular expression:

Boundaries

^
Beginning of input
$
End of input
b
Roughly speaking, beginning of a word (zero-width word boundary)
B
Roughly speaking, end of a word (zero-width non-word boundary)

Quantifiers

x*
x*? (non-greedy)
Matches x 0 or more times
x+
x+? (non-greedy)
Matches x 1 or more times
x?
Matches x 0 or 1 time
x{n}
Matches x n times or n+ times
x{n,}
Matches x at least n times
x {n,m}
matches x between n and m times

Grouping & alternation

( )
delimits a group of patterns, for instance (n*)(m*) matches n 0 or more times, then m 0 or more times
|
Matches either alternative patterns foo|bar matches either “foo” or “bar”

Character sets and character classes

[xyz]
Matches anything that is within xyz. Alphanumerical characters can be matches with a-z and A-Z, and digits can be matches with 0-9
opposite is [^xyz]

 

.
Dot: matches anything but line terminations
d
Matches a digit
opposite is D
w
Matches an alphanumeric character
opposite is W
s
Matches white space characters (tabs, spaces, etc)
Same as  [ fnrtv?u00a0u1680?u180eu2000?-u200a?u2028u2029u202fu205f?u3000ufeff]
opposite is S
[b]
Matches backspace
Not the same as b
Standard