Skip to contents

I now show a standard way to get geographic metadata on UK postcodes, using the postcodes and postcodes_metadata functions in this package.


Imagine that we want to know more about where six people live, but we only know their id and something about their postcode.

library(DataKindR)
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)

data <- tibble(
  id = c("A1", "B1", "B2", "C1", "C2", "C3"),
  postcode = c("Me1 2re", "W1A 1AA ", "N1", " SW1A 0AA", "XX1 0XX", "sw6 1hs")
  )

data
#> # A tibble: 6 × 2
#>   id    postcode   
#>   <chr> <chr>      
#> 1 A1    "Me1 2re"  
#> 2 B1    "W1A 1AA " 
#> 3 B2    "N1"       
#> 4 C1    " SW1A 0AA"
#> 5 C2    "XX1 0XX"  
#> 6 C3    "sw6 1hs"


Tidying the postcodes

As you can see, some of these postcodes are poorly formatted or incomplete. To tidy them, I’ll use the postcodes function with a passcode_type parameter of "full". By doing so, we exclude missing entries (such as the partial postcode of N1).

data_tidy <- data |> 
  rowwise() |> 
  mutate(
    postcode_clean = postcodes(
      postcode_value = postcode, 
      postcode_type = "full"
      )
    ) |> 
  ungroup() 

data_tidy
#> # A tibble: 6 × 3
#>   id    postcode    postcode_clean
#>   <chr> <chr>       <chr>         
#> 1 A1    "Me1 2re"   ME1 2RE       
#> 2 B1    "W1A 1AA "  W1A 1AA       
#> 3 B2    "N1"        NA            
#> 4 C1    " SW1A 0AA" SW1A 0AA      
#> 5 C2    "XX1 0XX"   XX1 0XX       
#> 6 C3    "sw6 1hs"   SW6 1HS

I then extract the resulting postcodes into a list to use with the postcode API. (I chose not to remove NA values in this case but you could if many exist.)

list_postcodes <- data_tidy |> 
  pull(postcode_clean)

list_postcodes
#> [1] "ME1 2RE"  "W1A 1AA"  NA         "SW1A 0AA" "XX1 0XX"  "SW6 1HS"


Getting geographic metadata on the clean postcodes

Despite this cleaning, there is still an issue, as the penultimate postcode is correctly formatted but not real. I therefore want to ensure that this postcode (XX1 0XX) doesn’t cause our entire run to break, thereby stopping us from getting information on our final postcode. To do so, I wrap our API function in the safely function from the purrr package.

safely_postcodes_metadata <- safely(postcodes_metadata)

I now call the postcode API on this list of postcodes. I then extract only the results of this call, before reshaping them.

metadata_postcodes <- map(
  .x = list_postcodes, 
  .f = ~safely_postcodes_metadata(postcode_value = .x)
  )  

data_out <- tibble(
  postcode = list_postcodes,
  results = metadata_postcodes |> 
    map("result")
  ) |>
  tidyr::drop_na() |>
  unnest(cols = results)

data_out
#> # A tibble: 4 × 21
#>   postcode pcd     rgn       rgn_name   pcon    pcon_name laua  laua_name lsoa21
#>   <chr>    <chr>   <chr>     <chr>      <chr>   <chr>     <chr> <chr>     <chr> 
#> 1 ME1 2RE  ME1 2RE E12000008 South East E14001… Chatham … E060… Medway    E0101…
#> 2 W1A 1AA  W1A 1AA E12000007 London     E14001… Cities o… E090… Westmins… E0100…
#> 3 SW1A 0AA SW1A0AA E12000007 London     E14001… Cities o… E090… Westmins… E0100…
#> 4 SW6 1HS  SW6 1HS E12000007 London     E14001… Chelsea … E090… Hammersm… E0100…
#> # ℹ 12 more variables: lsoa21_name <chr>, msoa21 <chr>, msoa21_name <chr>,
#> #   ward <chr>, ward_name <chr>, oac11_code <chr>, oac11_group <chr>,
#> #   oac11_subgroup <chr>, oac11_supergroup <chr>, imd <int>, lat <dbl>,
#> #   lon <dbl>


Presenting our extended data

Finally, I can rejoin the results directly above with our earlier data.

data_tidy |> 
  left_join(
    data_out, 
    by = join_by(postcode_clean == postcode)
    )
#> # A tibble: 6 × 23
#>   id    postcode    postcode_clean pcd     rgn    rgn_name pcon  pcon_name laua 
#>   <chr> <chr>       <chr>          <chr>   <chr>  <chr>    <chr> <chr>     <chr>
#> 1 A1    "Me1 2re"   ME1 2RE        ME1 2RE E1200… South E… E140… Chatham … E060…
#> 2 B1    "W1A 1AA "  W1A 1AA        W1A 1AA E1200… London   E140… Cities o… E090…
#> 3 B2    "N1"        NA             NA      NA     NA       NA    NA        NA   
#> 4 C1    " SW1A 0AA" SW1A 0AA       SW1A0AA E1200… London   E140… Cities o… E090…
#> 5 C2    "XX1 0XX"   XX1 0XX        NA      NA     NA       NA    NA        NA   
#> 6 C3    "sw6 1hs"   SW6 1HS        SW6 1HS E1200… London   E140… Chelsea … E090…
#> # ℹ 14 more variables: laua_name <chr>, lsoa21 <chr>, lsoa21_name <chr>,
#> #   msoa21 <chr>, msoa21_name <chr>, ward <chr>, ward_name <chr>,
#> #   oac11_code <chr>, oac11_group <chr>, oac11_subgroup <chr>,
#> #   oac11_supergroup <chr>, imd <int>, lat <dbl>, lon <dbl>