
Getting geographic metadata from UK postcodes
Source:vignettes/postcodes-vignette.Rmd
postcodes-vignette.RmdI now show a standard way to get geographic metadata on UK postcodes,
using the postcodes and postcodes_metadata
functions in this package.
Imagine that we want to know more about where six people live, but we
only know their id and something about their
postcode.
library(DataKindR)
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)
data <- tibble(
id = c("A1", "B1", "B2", "C1", "C2", "C3"),
postcode = c("Me1 2re", "W1A 1AA ", "N1", " SW1A 0AA", "XX1 0XX", "sw6 1hs")
)
data
#> # A tibble: 6 × 2
#> id postcode
#> <chr> <chr>
#> 1 A1 "Me1 2re"
#> 2 B1 "W1A 1AA "
#> 3 B2 "N1"
#> 4 C1 " SW1A 0AA"
#> 5 C2 "XX1 0XX"
#> 6 C3 "sw6 1hs"Tidying the postcodes
As you can see, some of these postcodes are poorly formatted or
incomplete. To tidy them, I’ll use the postcodes function
with a passcode_type parameter of "full". By
doing so, we exclude missing entries (such as the partial postcode of
N1).
data_tidy <- data |>
rowwise() |>
mutate(
postcode_clean = postcodes(
postcode_value = postcode,
postcode_type = "full"
)
) |>
ungroup()
data_tidy
#> # A tibble: 6 × 3
#> id postcode postcode_clean
#> <chr> <chr> <chr>
#> 1 A1 "Me1 2re" ME1 2RE
#> 2 B1 "W1A 1AA " W1A 1AA
#> 3 B2 "N1" NA
#> 4 C1 " SW1A 0AA" SW1A 0AA
#> 5 C2 "XX1 0XX" XX1 0XX
#> 6 C3 "sw6 1hs" SW6 1HSI then extract the resulting postcodes into a list to use with the
postcode API. (I chose not to remove NA values in this case
but you could if many exist.)
list_postcodes <- data_tidy |>
pull(postcode_clean)
list_postcodes
#> [1] "ME1 2RE" "W1A 1AA" NA "SW1A 0AA" "XX1 0XX" "SW6 1HS"Getting geographic metadata on the clean postcodes
Despite this cleaning, there is still an issue, as the penultimate
postcode is correctly formatted but not real. I therefore want to ensure
that this postcode (XX1 0XX) doesn’t cause our entire run
to break, thereby stopping us from getting information on our final
postcode. To do so, I wrap our API function in the safely
function from the purrr package.
safely_postcodes_metadata <- safely(postcodes_metadata)I now call the postcode API on this list of postcodes. I then extract only the results of this call, before reshaping them.
metadata_postcodes <- map(
.x = list_postcodes,
.f = ~safely_postcodes_metadata(postcode_value = .x)
)
data_out <- tibble(
postcode = list_postcodes,
results = metadata_postcodes |>
map("result")
) |>
tidyr::drop_na() |>
unnest(cols = results)
data_out
#> # A tibble: 4 × 21
#> postcode pcd rgn rgn_name pcon pcon_name laua laua_name lsoa21
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ME1 2RE ME1 2RE E12000008 South East E14001… Chatham … E060… Medway E0101…
#> 2 W1A 1AA W1A 1AA E12000007 London E14001… Cities o… E090… Westmins… E0100…
#> 3 SW1A 0AA SW1A0AA E12000007 London E14001… Cities o… E090… Westmins… E0100…
#> 4 SW6 1HS SW6 1HS E12000007 London E14001… Chelsea … E090… Hammersm… E0100…
#> # ℹ 12 more variables: lsoa21_name <chr>, msoa21 <chr>, msoa21_name <chr>,
#> # ward <chr>, ward_name <chr>, oac11_code <chr>, oac11_group <chr>,
#> # oac11_subgroup <chr>, oac11_supergroup <chr>, imd <int>, lat <dbl>,
#> # lon <dbl>Presenting our extended data
Finally, I can rejoin the results directly above with our earlier data.
data_tidy |>
left_join(
data_out,
by = join_by(postcode_clean == postcode)
)
#> # A tibble: 6 × 23
#> id postcode postcode_clean pcd rgn rgn_name pcon pcon_name laua
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A1 "Me1 2re" ME1 2RE ME1 2RE E1200… South E… E140… Chatham … E060…
#> 2 B1 "W1A 1AA " W1A 1AA W1A 1AA E1200… London E140… Cities o… E090…
#> 3 B2 "N1" NA NA NA NA NA NA NA
#> 4 C1 " SW1A 0AA" SW1A 0AA SW1A0AA E1200… London E140… Cities o… E090…
#> 5 C2 "XX1 0XX" XX1 0XX NA NA NA NA NA NA
#> 6 C3 "sw6 1hs" SW6 1HS SW6 1HS E1200… London E140… Chelsea … E090…
#> # ℹ 14 more variables: laua_name <chr>, lsoa21 <chr>, lsoa21_name <chr>,
#> # msoa21 <chr>, msoa21_name <chr>, ward <chr>, ward_name <chr>,
#> # oac11_code <chr>, oac11_group <chr>, oac11_subgroup <chr>,
#> # oac11_supergroup <chr>, imd <int>, lat <dbl>, lon <dbl>