::p_load(sf, spdep, tmap, tidyverse, knitr,sfdep, plotly, zoo) pacman
In-class_Ex2
Overview
Getting started
For the purpose of this in-class exercise, the Hunan datasets will be used. There are 2 data sets in this case, they are: - Hunan, a geospatial data set in ESRI shapefile format, and - Hunan_2012, an attribute data set in csv format
Loading packages and data
Loading data
Geospatial data
The following uses the st_read() function from the sf package to read the geospatial data.
<- st_read(dsn = "data/geospatial",
hunan layer = "Hunan")
Reading layer `Hunan' from data source
`C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\In-class_Ex\Ex2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS: WGS 84
Import attributable table
<- read_csv("data/aspatial/Hunan_2012.csv") hunan2012
Combining the tables
Left join because of joining different data types. Here we retain the hunan dataframe and append the hunan table in order to save the geometry data automatically due to the nature of it being a spatial data.
<- left_join(hunan,hunan2012)%>%
hunan_GDPPC select(1:4, 7, 15)
Choropleth Map
Here we will be plotting the choropleth map of the hunan_GDPPC joint data from the previous step.
tmap_mode("plot")
tm_shape(hunan_GDPPC) +
tm_fill("GDPPC",
style = "quantile",
palette = "Blues",
title = "GDPPC") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Distribution of GDP per capita by district, Hunan Province",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)
Spatial Weights
There are 2 types of spatial weights, contiguity and distance based. Contiguity spatial weights refer to having a common border and distance based spatial weights are based on a distance.
Contiguity Spatial Weights
Here we will first try the contiguity weights using st_contiguity() to obtain the number of neighbours, followed by st_weights() to obtain the the contiguity spatial weights.
Queen’s method
Here we will be using the Queen’s method.
<- hunan_GDPPC %>%
nb_queen mutate(nb = st_contiguity(geometry),
.before = 1)
summary(nb_queen$nb)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Link number distribution:
1 2 3 4 5 6 7 8 9 11
2 2 12 16 24 14 11 4 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links
The report shows that there are 88 area units or regions in the Hunan province and the area with the most number of connected neighbours is 11 and the least is 1.
One advantage of using sfdep is the output being in a table format.
kable(head(nb_queen,
n=10))
nb | NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | County | GDPPC | geometry |
---|---|---|---|---|---|---|---|
2, 3, 4, 57, 85 | Changde | 21098 | Anxiang | County | Anxiang | 23667 | POLYGON ((112.0625 29.75523… |
1, 57, 58, 78, 85 | Changde | 21100 | Hanshou | County | Hanshou | 20981 | POLYGON ((112.2288 29.11684… |
1, 4, 5, 85 | Changde | 21101 | Jinshi | County City | Jinshi | 34592 | POLYGON ((111.8927 29.6013,… |
1, 3, 5, 6 | Changde | 21102 | Li | County | Li | 24473 | POLYGON ((111.3731 29.94649… |
3, 4, 6, 85 | Changde | 21103 | Linli | County | Linli | 25554 | POLYGON ((111.6324 29.76288… |
4, 5, 69, 75, 85 | Changde | 21104 | Shimen | County | Shimen | 27137 | POLYGON ((110.8825 30.11675… |
67, 71, 74, 84 | Changsha | 21109 | Liuyang | County City | Liuyang | 63118 | POLYGON ((113.9905 28.5682,… |
9, 46, 47, 56, 78, 80, 86 | Changsha | 21110 | Ningxiang | County | Ningxiang | 62202 | POLYGON ((112.7181 28.38299… |
8, 66, 68, 78, 84, 86 | Changsha | 21111 | Wangcheng | County | Wangcheng | 70666 | POLYGON ((112.7914 28.52688… |
16, 17, 19, 20, 22, 70, 72, 73 | Chenzhou | 21112 | Anren | County | Anren | 12761 | POLYGON ((113.1757 26.82734… |
Rook’s method
Here we will be using the Rook’s method.
<- hunan_GDPPC %>%
nb_rook mutate(nb = st_contiguity(geometry,
queen = FALSE),
.before = 1)
summary(nb_rook$nb)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 440
Percentage nonzero weights: 5.681818
Average number of links: 5
Link number distribution:
1 2 3 4 5 6 7 8 9 10
2 2 12 20 21 14 11 3 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 10 links
Here we see the region with the highest number of neighbours has dropped to 10 due to loss of point or corner neighbours.
Higher order neighbours
Higher order neighbours refer to secondary neighbours that are not directly connected to the region of interest. Depending on the order number, it is how many neighbours away.
We will use **st_nb_lag_cumul()* for calculating.
<- hunan_GDPPC %>%
nb2_queen mutate(nb = st_contiguity(geometry),
nb2 = st_nb_lag_cumul(nb, 2),
.before = 1)
summary(nb2_queen)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Link number distribution:
1 2 3 4 5 6 7 8 9 11
2 2 12 16 24 14 11 4 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links
Neighbour list object:
Number of regions: 88
Number of nonzero links: 1324
Percentage nonzero weights: 17.09711
Average number of links: 15.04545
Link number distribution:
5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 28 33
2 1 6 4 5 4 8 5 10 4 4 8 4 8 5 2 2 1 2 1 1 1
2 least connected regions:
30 88 with 5 links
1 most connected region:
56 with 33 links
nb nb2 NAME_2 ID_3 NAME_3
NULL:NULL NULL:NULL Length:88 Min. :21098 Length:88
Class :character 1st Qu.:21125 Class :character
Mode :character Median :21150 Mode :character
Mean :21150
3rd Qu.:21174
Max. :21201
ENGTYPE_3 County GDPPC geometry
Length:88 Length:88 Min. : 8497 POLYGON :88
Class :character Class :character 1st Qu.:14566 epsg:4326 : 0
Mode :character Mode :character Median :20433 +proj=long...: 0
Mean :24405
3rd Qu.:27224
Max. :88656
Here we see the number of neighbours increasing with the highest number of neighbours reaching 33 for 1 region.
Deriving contiguity weights
After calculating the number of neighbours, we can then compute the contiguity weights using st_weights().
<- hunan_GDPPC %>%
wm_q mutate(nb = st_contiguity(geometry),
wt = st_weights(nb, style = "W"),
.before = 1)
wm_q
Simple feature collection with 88 features and 8 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS: WGS 84
First 10 features:
nb
1 2, 3, 4, 57, 85
2 1, 57, 58, 78, 85
3 1, 4, 5, 85
4 1, 3, 5, 6
5 3, 4, 6, 85
6 4, 5, 69, 75, 85
7 67, 71, 74, 84
8 9, 46, 47, 56, 78, 80, 86
9 8, 66, 68, 78, 84, 86
10 16, 17, 19, 20, 22, 70, 72, 73
wt
1 0.2, 0.2, 0.2, 0.2, 0.2
2 0.2, 0.2, 0.2, 0.2, 0.2
3 0.25, 0.25, 0.25, 0.25
4 0.25, 0.25, 0.25, 0.25
5 0.25, 0.25, 0.25, 0.25
6 0.2, 0.2, 0.2, 0.2, 0.2
7 0.25, 0.25, 0.25, 0.25
8 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571
9 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667
10 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125
NAME_2 ID_3 NAME_3 ENGTYPE_3 County GDPPC
1 Changde 21098 Anxiang County Anxiang 23667
2 Changde 21100 Hanshou County Hanshou 20981
3 Changde 21101 Jinshi County City Jinshi 34592
4 Changde 21102 Li County Li 24473
5 Changde 21103 Linli County Linli 25554
6 Changde 21104 Shimen County Shimen 27137
7 Changsha 21109 Liuyang County City Liuyang 63118
8 Changsha 21110 Ningxiang County Ningxiang 62202
9 Changsha 21111 Wangcheng County Wangcheng 70666
10 Chenzhou 21112 Anren County Anren 12761
geometry
1 POLYGON ((112.0625 29.75523...
2 POLYGON ((112.2288 29.11684...
3 POLYGON ((111.8927 29.6013,...
4 POLYGON ((111.3731 29.94649...
5 POLYGON ((111.6324 29.76288...
6 POLYGON ((110.8825 30.11675...
7 POLYGON ((113.9905 28.5682,...
8 POLYGON ((112.7181 28.38299...
9 POLYGON ((112.7914 28.52688...
10 POLYGON ((113.1757 26.82734...
The 3 arguments used are
- nb - list of neighbours
- style - Default will be W style, repsenting row standardisation. Others include C, global standardisation, U, C style divided by number of neighbours, and S, sums of all links to n.
- allow_zero - if TRUE, it assigns 0 to regions without neighbours
Distance based weights
There are 3 difference distance based weights:
- Fixed distance weights
- Adaptive distance weights
- inverse distance weights
Fixed distance weights
As the name suggest, this method uses a fixed distance for measuring the number of neighbours. We will first determine the upper limit using the following steps.
<- sf::st_geometry(hunan_GDPPC)
geo <- st_knn(geo, longlat = TRUE)
nb <- unlist(st_nb_dists(geo, nb)) dists
st_nb_dists() is used for calculating the nearest neighbour, the output will be the distance to each neighbour for each region. unlist() is from the base R function and returns the output as a vector.
Next we will derive the statistical summary of the nearest neighbour distances.
summary(dists)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21.56 29.11 36.89 37.34 43.21 65.80
This shows that the max distance for the nearest neigh bour is 65.8km, and when we round up to 66km, each region would have at least 1 neighbour.
With that we can then compute the fixed distance weights.
<- hunan_GDPPC %>%
wm_fd mutate(nb = st_dist_band(geometry,
upper = 66),
wt = st_weights(nb),
.before = 1)
summary(wm_fd)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 400
Percentage nonzero weights: 5.165289
Average number of links: 4.545455
Link number distribution:
1 2 3 4 5 6 7
4 7 10 16 23 23 5
4 least connected regions:
30 32 65 75 with 1 link
5 most connected regions:
41 52 58 63 80 with 7 links
nb wt.Length wt.Class wt.Mode NAME_2 ID_3
NULL:NULL 6 -none- numeric Length:88 Min. :21098
5 -none- numeric Class :character 1st Qu.:21125
4 -none- numeric Mode :character Median :21150
4 -none- numeric Mean :21150
5 -none- numeric 3rd Qu.:21174
3 -none- numeric Max. :21201
3 -none- numeric
5 -none- numeric
6 -none- numeric
6 -none- numeric
4 -none- numeric
6 -none- numeric
4 -none- numeric
2 -none- numeric
4 -none- numeric
5 -none- numeric
5 -none- numeric
4 -none- numeric
5 -none- numeric
6 -none- numeric
6 -none- numeric
4 -none- numeric
4 -none- numeric
6 -none- numeric
6 -none- numeric
4 -none- numeric
3 -none- numeric
6 -none- numeric
2 -none- numeric
1 -none- numeric
4 -none- numeric
1 -none- numeric
5 -none- numeric
5 -none- numeric
5 -none- numeric
5 -none- numeric
2 -none- numeric
3 -none- numeric
5 -none- numeric
6 -none- numeric
7 -none- numeric
6 -none- numeric
5 -none- numeric
3 -none- numeric
6 -none- numeric
6 -none- numeric
5 -none- numeric
5 -none- numeric
5 -none- numeric
6 -none- numeric
5 -none- numeric
7 -none- numeric
2 -none- numeric
6 -none- numeric
4 -none- numeric
2 -none- numeric
6 -none- numeric
7 -none- numeric
3 -none- numeric
6 -none- numeric
5 -none- numeric
4 -none- numeric
7 -none- numeric
4 -none- numeric
1 -none- numeric
6 -none- numeric
2 -none- numeric
5 -none- numeric
3 -none- numeric
3 -none- numeric
4 -none- numeric
5 -none- numeric
6 -none- numeric
5 -none- numeric
1 -none- numeric
6 -none- numeric
5 -none- numeric
3 -none- numeric
6 -none- numeric
7 -none- numeric
5 -none- numeric
6 -none- numeric
5 -none- numeric
4 -none- numeric
3 -none- numeric
6 -none- numeric
4 -none- numeric
2 -none- numeric
NAME_3 ENGTYPE_3 County GDPPC
Length:88 Length:88 Length:88 Min. : 8497
Class :character Class :character Class :character 1st Qu.:14566
Mode :character Mode :character Mode :character Median :20433
Mean :24405
3rd Qu.:27224
Max. :88656
geometry
POLYGON :88
epsg:4326 : 0
+proj=long...: 0
Here we can see that there are 4 regions with only 1 neighbour and 5 regions with 7 neighbours.
st_dists_band() can be used to identify neighbours and the fixed distance selected. st_weights() uses the W style and the allow_zero to be TRUE as default.
Adaptive distance weights
Here we will use adaptive distance weights where the distance used is dependent on the data density.
<- hunan_GDPPC %>%
wm_ad mutate(nb = st_knn(geometry,
k=8),
wt = st_weights(nb),
.before = 1)
st_knn() is used to determine the number of neighbours to be used for calculating distance. st_weights() is used for calculaing the polygon spatial weights of the list of neightbours. It uses the W style and the allow_zero to be TRUE as default.
Inverse distance weights
This method will create proportional weights.
<- hunan_GDPPC %>%
wm_idw mutate(nb = st_contiguity(geometry),
wts = st_inverse_distance(nb, geometry,
scale = 1,
alpha = 1),
.before = 1)
st_contiguity() is used to identify the neighbours using a common border criteria. st_inverse_distance() is then used for calculating the inverse distance weights of the neighbours in the list.