::p_load(sf, spdep, tmap, tidyverse) pacman
Hands-on Exercise 2 part 2
Global Measures of Spatial Autocorrelation
Overview
In this exercise we will utilise Global and Local Measure of Spatial Autocrrelation(GLSA). We will plot a Moran scatterplot and compute and plot the spatial correlogram. Additionally we will also compute the Local Indicator of Spatial Association(LISA)
Loading packages and data
Load packages
Load data
Here we will use the local development indicators of Hunan province, China, in 2012.
<- st_read(dsn = "data/part 2/geospatial",
hunan layer = "Hunan")
Reading layer `Hunan' from data source
`C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Hands-on_Ex02\data\part 2\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS: WGS 84
<- read_csv("data/part 2/aspatial/Hunan_2012.csv") hunan2012
Rows: 88 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): County, City
dbl (27): avg_wage, deposite, FAI, Gov_Rev, Gov_Exp, GDP, GDPPC, GIO, Loan, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Relational join of data
Here we will join the hunan2012 dataframe to the polygon or the hunan map. Bascially relating the development data to various regions of Hunan.
<- left_join(hunan,hunan2012) %>%
hunan select(1:4, 7, 15)
Joining with `by = join_by(County)`
<- tm_shape(hunan) +
equal tm_fill("GDPPC",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal interval classification")
<- tm_shape(hunan) +
quantile tm_fill("GDPPC",
n = 5,
style = "quantile") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal quantile classification")
tmap_arrange(equal,
quantile, asp=1,
ncol=2)
Global Spatial Autocorrelation
Here we will compute global spatial autocorrelation statistics and perform a spatial complete randomness test.
Spatial autocorrelation describes the spatial relationship of variables and explore the statistical dependency of collection of variables in a region. These regions can refer to a continuous surface, fixed sites and areas that are dubdivisable.
Computing Contiguity Spatial Weights
We have to construct a spatial weights or to degine the relationships of the different regions. Here we will use the Queen contiguity weight matrix. Here we see if neighbours are similar to one another.
<- poly2nb(hunan,
wm_q queen=TRUE)
summary(wm_q)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Link number distribution:
1 2 3 4 5 6 7 8 9 11
2 2 12 16 24 14 11 4 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links
The data shows that there are 88 regions. 1 region has 11 connected and immediate neighbours and 2 regions only having 1.
Row-standardised weight matrix
Next we will assign weights to each neighbour using equal weight. This is done using 1/number of neighbours. We will then sum up the weighted income values. This is to create proportional weights especially when the data contains unequal number of weights and is useful against potentially bias or skewed data.
<- nb2listw(wm_q,
rswm_q style="W",
zero.policy = TRUE)
rswm_q
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 88 7744 88 37.86334 365.9147
Global Spatial Autocorrelation using Moran’s I test
Moran’s I describes how features differ from the values in the study area as a whole. It uses a Moran I(Z-value).
I-value more than 0: Clustered and observations are similar.
I-value less than 0: Dispersed, observations tend to be different.
I-value close to 0: Random, scattered all over.
Performing the test
Here we use the moran.test() for calculating the Moran I value.
Null hypothesis or H0:
- observed pattern are equally likely as other pattern
- the value of one location is not dependent on neighbours
- changing the value of one region does not affect another
moran.test(hunan$GDPPC,
listw=rswm_q,
zero.policy = TRUE,
na.action=na.omit)
Moran I test under randomisation
data: hunan$GDPPC
weights: rswm_q
Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.300749970 -0.011494253 0.004348351
From the test above we can see that the Moran I statistical value is more than 0, meaning that the development of Hunan province has clustering.
Moran’s I test require normal and randomised data and we can test that using Monte Carlo simulation. The simulation predicts the likelihood of outcomes based on random values. Here we use set.seed() to fix the random values so that we will obtain the same set when generating the random values. The p-value obtained will then be used to compare against the p-value of the actual p-value from the Moran’s I test.
We can also simulate using moran.mc().
set.seed(1234)
= moran.mc(hunan$GDPPC,
bpermlistw=rswm_q,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
bperm
Monte-Carlo simulation of Moran I
data: hunan$GDPPC
weights: rswm_q
number of simulations + 1: 1000
statistic = 0.30075, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Both the actual test and simulated shows a small p-value, which means we can reject the null hypothesis of the Moran’s I test and we have sufficient evidence that there is an observed pattern or values of a region is affected by another neighbour.
Visualising the Moran’s I test
We can then plot the distribution of the simulated Moran’s I statistical test using a histogram.
mean(bperm$res[1:999])
[1] -0.01504572
var(bperm$res[1:999])
[1] 0.004371574
summary(bperm$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.18339 -0.06168 -0.02125 -0.01505 0.02611 0.27593
hist(bperm$res,
freq=TRUE,
breaks=20,
xlab="Simulated Moran's I")
abline(v=0,
col="red")
Global Spatial Autocorrelation using Geary’s C test
Geary’s C test describes how features differ from their immediate neighbours. It uses a Geary C(Z value).
- If the Geary C value is larger than 1: the features are dispersed and observations tend to be different.
- If the Geary C value is smaller than 1: the features are clustered and observations tend to be similar.
- If the Geary C value is close to 1: the features are randomly arranged.
Relationship to Moran’s I test
Both tests have an inverse relationship to each other where a low C value and high I value can be seen when there is clustering of similar values. High C value and low I value can be seen when dissimilar values cluster.
C values range from 0 to 3 while I values range from -1 to 1.
Performing the test
Here we use the geary.mc() to perform the Geary C test.
geary.test(hunan$GDPPC, listw=rswm_q)
Geary C test under randomisation
data: hunan$GDPPC
weights: rswm_q
Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation Variance
0.6907223 1.0000000 0.0073364
We can also perform a permutation test with a simulation.
set.seed(1234)
=geary.mc(hunan$GDPPC,
bpermlistw=rswm_q,
nsim=999)
bperm
Monte-Carlo simulation of Geary C
data: hunan$GDPPC
weights: rswm_q
number of simulations + 1: 1000
statistic = 0.69072, observed rank = 1, p-value = 0.001
alternative hypothesis: greater
Visualising the Geary’s C test
We can then plot the distribution of the simulated Geary’s C statistical test using a histogram.
mean(bperm$res[1:999])
[1] 1.004402
var(bperm$res[1:999])
[1] 0.007436493
summary(bperm$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7142 0.9502 1.0052 1.0044 1.0595 1.2722
hist(bperm$res, freq=TRUE, breaks=20, xlab="Simulated Geary c")
abline(v=1, col="red")
Spatial Correlogram
Here we examine the patterns of the Moran’s I and Geary’s C test. We can see the correlation of when distances are increased.
Moran’s I correlogram
We will use the sp.correlogram() for calculating the spatial correlogram of the devleopment of Hunan.
<- sp.correlogram(wm_q,
MI_corr $GDPPC,
hunanorder=6,
method="I",
style="W")
plot(MI_corr)
print(MI_corr)
Spatial correlogram for hunan$GDPPC
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.3007500 -0.0114943 0.0043484 4.7351 2.189e-06 ***
2 (88) 0.2060084 -0.0114943 0.0020962 4.7505 2.029e-06 ***
3 (88) 0.0668273 -0.0114943 0.0014602 2.0496 0.040400 *
4 (88) 0.0299470 -0.0114943 0.0011717 1.2107 0.226015
5 (88) -0.1530471 -0.0114943 0.0012440 -4.0134 5.984e-05 ***
6 (88) -0.1187070 -0.0114943 0.0016791 -2.6164 0.008886 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Geary’s C correlogram
We will use the sp.correlogram() for calculating the spatial correlogram of the devleopment of Hunan.
<- sp.correlogram(wm_q,
GC_corr $GDPPC,
hunanorder=6,
method="C",
style="W")
plot(GC_corr)
print(GC_corr)
Spatial correlogram for hunan$GDPPC
method: Geary's C
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.6907223 1.0000000 0.0073364 -3.6108 0.0003052 ***
2 (88) 0.7630197 1.0000000 0.0049126 -3.3811 0.0007220 ***
3 (88) 0.9397299 1.0000000 0.0049005 -0.8610 0.3892612
4 (88) 1.0098462 1.0000000 0.0039631 0.1564 0.8757128
5 (88) 1.2008204 1.0000000 0.0035568 3.3673 0.0007592 ***
6 (88) 1.0773386 1.0000000 0.0058042 1.0151 0.3100407
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1