Pattern matching with regular expressions: a quick reminder

Pattern matching is a powerful for developers, but it can be a daunting task to write (or read) a regular expression.
This quick reminder gives you the bare necessities to tackle regular expressions, if you’ve dabbled with regexp a bit but can’t remember it all.

Regular expressions in Perl

If you take “text” in the widest possible sense, perhaps 90% of what you do is 90% text processing. That’s really what Perl is all about and always has been about—in fact, it’s even part of Perl’s name: Practical Extraction and Report Language.
Larry Wall et al., Programming Perl

There you have it: regular expressions and pattern matching are at the core of Perl, which is why I started this post with it. It’s an integral part of the language, so much so that there is no function called match or subst in Perl, just a simple expression that you can put in a condition:

$_ = "Is there a doctor here?";
if (/doctor/) {
     print "Doctor who?";

Regular expressions in PHP and JavaScript

In PHP, you’ll have to use a dedicated function called preg_match() to get a match from a regular expression.
preg_match() needs at least two arguments: a regular expression, and a string to match it against:

$subject = "Is there a doctor here?";
$pattern = '/doctor/';
if (preg_match($pattern, $subject)) {
   echo "Yes. Yes, there is.";
(source: PHP documentation: preg_match)

JavaScript has its own RegExp  object, and a handy function that matches a string agains a RegExp object:

var subject = "Is there a doctor here?";
var pattern = new RegExp('doctor');

String.prototype.match() creates a new RegExp object on the fly if a string is provided instead:

var subject = "Is there a doctor here?";
var pattern = /doctor/;

(Source: Mozilla Developer Network)

Meta-characters in regular expressions

Beyond the most elementary expressions, regexps can sometimes look like gibberish, unless you know what all these characters stand for. The next few paragraph are a reminder of the most useful meta-characters you can find in a regular expression:


Beginning of input
End of input
Roughly speaking, beginning of a word (zero-width word boundary)
Roughly speaking, end of a word (zero-width non-word boundary)


x*? (non-greedy)
Matches x 0 or more times
x+? (non-greedy)
Matches x 1 or more times
Matches x 0 or 1 time
Matches x n times or n+ times
Matches x at least n times
x {n,m}
matches x between n and m times

Grouping & alternation

( )
delimits a group of patterns, for instance (n*)(m*) matches n 0 or more times, then m 0 or more times
Matches either alternative patterns foo|bar matches either “foo” or “bar”

Character sets and character classes

Matches anything that is within xyz. Alphanumerical characters can be matches with a-z and A-Z, and digits can be matches with 0-9
opposite is [^xyz]


Dot: matches anything but line terminations
Matches a digit
opposite is D
Matches an alphanumeric character
opposite is W
Matches white space characters (tabs, spaces, etc)
Same as  [ fnrtv?u00a0u1680?u180eu2000?-u200a?u2028u2029u202fu205f?u3000ufeff]
opposite is S
Matches backspace
Not the same as b