Programming

Regexp: how to validate a UK Postcode

In the UK, postcodes are a crucial part of someone’s address, as they narrow addresses down to the street of the desired location (in countries like France, however, a postcode only tells you which city or town the address is located).
It can be handy to make sure your users provide a valid UK postcode in a form. To do this, we’ll need to match the postcode to a regexp… but what is a valid UK postcode, and what regexp can we use to make sure a postcode is indeed valid ?

Wikipedia tells us UK postcodes can use the following formats (‘A’ stands for any letter, and ‘9’ for any digit):

Format Coverage Example
AA9A 9AA WC postcode area; EC1–EC4, NW1W, SE1P, SW1 EC1A 1BB
A9A 9AA E1W, N1C, N1P W1A 0AX
A9 9AA B, E, G, L, M, N, S, W M1 1AE
A99 9AA B33 8TH
AA9 9AA All other postcodes CR2 6XH
AA99 9AA DN55 1PT

To match any of these patterns, we can use the following regular expression:

/^[a-zA-Z]{1,2}([0-9]{1,2}|[0-9][a-zA-Z])\s*[0-9][a-zA-Z]{2}$/

Let’s decompose this so we know what it’s doing:

  • We start by making sure our postcode isn’t surrounded by anything else in our field, so we encase it between a beginning of input boundary (^) and an end of input boundary ($).
  • Then, we accept one or two letters [a-zA-Z]{1,2}
  • Followed by either one digit and a letter, or between one and two digits: ([0-9]{1,2}|[0-9][a-zA-Z])
  • An optional space \s*
  • Finally, a digit followed by two letters: [0-9][a-zA-Z]{2}

What have I missed? Is there a better way to write this? How hard is it to get UK addresses from a postcode using an external API? Let me know !

Standard
Programming

Pattern matching with regular expressions: a quick reminder

Pattern matching is a powerful for developers, but it can be a daunting task to write (or read) a regular expression.
This quick reminder gives you the bare necessities to tackle regular expressions, if you’ve dabbled with regexp a bit but can’t remember it all.

Regular expressions in Perl

If you take “text” in the widest possible sense, perhaps 90% of what you do is 90% text processing. That’s really what Perl is all about and always has been about—in fact, it’s even part of Perl’s name: Practical Extraction and Report Language.
Larry Wall et al., Programming Perl

There you have it: regular expressions and pattern matching are at the core of Perl, which is why I started this post with it. It’s an integral part of the language, so much so that there is no function called match or subst in Perl, just a simple expression that you can put in a condition:

$_ = "Is there a doctor here?";
if (/doctor/) {
     print "Doctor who?";
}

Regular expressions in PHP and JavaScript

In PHP, you’ll have to use a dedicated function called preg_match() to get a match from a regular expression.
preg_match() needs at least two arguments: a regular expression, and a string to match it against:

$subject = "Is there a doctor here?";
$pattern = '/doctor/';
if (preg_match($pattern, $subject)) {
   echo "Yes. Yes, there is.";
}
(source: PHP documentation: preg_match)

JavaScript has its own RegExp  object, and a handy function that matches a string agains a RegExp object:

var subject = "Is there a doctor here?";
var pattern = new RegExp('doctor');
console.log(subject.match(re);

String.prototype.match() creates a new RegExp object on the fly if a string is provided instead:

var subject = "Is there a doctor here?";
var pattern = /doctor/;
console.log(subject.match(re);

(Source: Mozilla Developer Network)

Meta-characters in regular expressions

Beyond the most elementary expressions, regexps can sometimes look like gibberish, unless you know what all these characters stand for. The next few paragraph are a reminder of the most useful meta-characters you can find in a regular expression:

Boundaries

^
Beginning of input
$
End of input
b
Roughly speaking, beginning of a word (zero-width word boundary)
B
Roughly speaking, end of a word (zero-width non-word boundary)

Quantifiers

x*
x*? (non-greedy)
Matches x 0 or more times
x+
x+? (non-greedy)
Matches x 1 or more times
x?
Matches x 0 or 1 time
x{n}
Matches x n times or n+ times
x{n,}
Matches x at least n times
x {n,m}
matches x between n and m times

Grouping & alternation

( )
delimits a group of patterns, for instance (n*)(m*) matches n 0 or more times, then m 0 or more times
|
Matches either alternative patterns foo|bar matches either “foo” or “bar”

Character sets and character classes

[xyz]
Matches anything that is within xyz. Alphanumerical characters can be matches with a-z and A-Z, and digits can be matches with 0-9
opposite is [^xyz]

 

.
Dot: matches anything but line terminations
d
Matches a digit
opposite is D
w
Matches an alphanumeric character
opposite is W
s
Matches white space characters (tabs, spaces, etc)
Same as  [ fnrtv?u00a0u1680?u180eu2000?-u200a?u2028u2029u202fu205f?u3000ufeff]
opposite is S
[b]
Matches backspace
Not the same as b
Standard