Regular Expressions in Access – my new love affair

Comic strip
Regular Expressions Hero

I have a confession. I love Regular Expressions.  Ok, so it’s a new romance, and I’ve not had much experience with them yet, but I love the way they work. They can be tricky to get your head round them, but they can also save you a lot of time and effort once you get them working.  Essentially they are a way of specifying a sequence of characters to allow you to search for a particular pattern. For example, if you had a list of random single characters, you could use the expression [A-D] to find all those that are in the A-D range of letters.

I first came across them when doing a Coursera specialization on Python, but I’ve not had much opportunity to use them in my regular work. Until that is, my boss asked me to find a way of finding incorrect postcodes in student’s addresses, and if possible fix them. As most of the work I do in my role uses Microsoft Access I decided to find out if you can use them in vba, and it turns out you can.  Here I will show you how I managed to get the code working.

To use the built-in features for regular expressions in Access you will need to enable them in the developer window.  Go to the Tools menu, and click on References.  Scroll down to “Microsoft VBScript Regular Expressions 5.5” and select it.  This activates the functions you will need.

References window

The function I used was the regexp function. The code for this is given below.

Function regexp( _
 StringToCheck As Variant, _
 PatternToUse As String, _
 Optional CaseSensitive As Boolean = True)

Dim re As New regexp
 re.Pattern = PatternToUse
 re.Global = False
 re.IgnoreCase = Not CaseSensitive
 Dim m
 For Each m In re.Execute(StringToCheck)
 regexp = m.Value
 End Function

This function takes in two parameters (third one is optional), a string, and a regular expression string. If the string matches the regular expression, it returns the string, otherwise it returns a null value.

In order to create a regular expression for a UK postcode, put together the following:

^[A-Z]{1,2}[0-9A-Z]{1,2} [0-9][ABD-HJLNP-UW-Z]{2}$

You can find more complicated versions online that will perform more checks, but this will serve for my purposes.  It breaks down like this:

^[A-Z]{1,2}: The string must start with a minimum of 1, maximum of 2, uppercase characters between A and Z

[0-9A-Z]{1,2}: The next section must be a minimum of 1, maximum of 2, uppercase characters or numbers

[0-9]: a space, and then a single number

[ABD-HJLNP-UW-Z]{2}$: Finally, there must be 2 letters, but not including C, I, K, M, O or V.

This regular expression will work for the following combinations of postcode valid in the UK:

A9 9AA, A99 9AA, AA9 9AA, AA99 9AA, A9A 9AA, AA9A 9AA

There are a number of things you can do with this, but as a basic starting point, you could create the following function which will take in a string, compare it to this regular expression, return an OK if it matches, or a NO if it doesn’t.

Function CheckPostCodeRegEx(Postcode As String) As String

Dim PostcodeFormat As String
 PostcodeFormat = "^[A-Z]{1,2}[0-9A-Z]{1,2} [0-9][ABD-HJLNP-UW-Z]{2}$"

If regexp(Postcode, PostcodeFormat) = Postcode Then
 CheckPostCodeRegEx = "OK"
 CheckPostCodeRegEx = "NO"
 End If

End Function

In order to complete the data quality checks required for this task, I used this function initially to check whether a postcode already was in the right format, if it was I could move on to the next one. If not, I removed any spaces (to get rid of any extra ones), made sure it was all in upper case, then put the space back in the right place.  Anything that still did not match the regular expression was probably not a valid input in the first place, so would likely need manual intervention to correct it.