Postcode Faker
A recent project I worked on involved plotting the location of home addresses on a map. The location of their home I generated from their postcode using an api provided by the Office for National Statistics. (There will be more details on this in a future post.) When writing this project up I want to show the visualisations I'd generated, but I did not want to display the actual location of the people in the data set as this is obviously not a good idea for privacy reasons. I started looking about for python libabries to generate valid postcodes. Valid, being key here as generating any random string will not necessarily result in a location that actually exists.
A uk postcode is a combination of either 6 or 7 characters. What I want to be able to do is specify the start of the postcode as a string and generate a given number of random valid postcodes that start with that string.
The first step was to obtain a list of all valid postcodes in the UK. This came from the ONS, Office for National Statistics in the form of a large spreadsheet. The first column of the spreadsheet contains the postcode. The rest of the columns are not-needed. First I ran the following command using awk to strip out the first column and save it as a separate CSV file.
awk -F "," '{print $1}' ons_postcode_data.csv > postcodes.csv
This sets a comma as a separator and prints out the first field. Awk uses 1 as the first field and not zero like python.
Then I wrote a python function to produce the random postcodes. First import pandas and load csv file into a dataframe.
import pandas as pd
postcodes = pd.read_csv('postcodes.csv')
Next create a function that takes as input a dataframe, a partial postcode and a number of random samples required. It compiles a regular expression that starts with the partial postcode searches for all postcodes that contain the string and then selects a random sample from them. Case is ignored. The resulting dataframe is returned from the function.
def pcfake(df, partial_pc, number):
regex = f'^{partial_pc}'
return df[df.iloc[:,0].str.contains(regex, case = False)].sample(n=number)
We can test this as follows and we should get 10 postcodes that all start with "np"
pcfake(postcodes, "Np", 10).head(10)
Comments