Tutorials
=============================
.. image::
../fotos/vbar/vbar42.jpg
:width: 144 px
:height: 400 px
:alt: Kootwijkerzand (NL)
:align: right
Here are the many tutorials I've written [#]_ dealing with various aspects of spatial statistics. Please note the dates of each one, the older ones may be partly obsolete or require some adjustment to new versions of the computer programs.
Geostatistics
----------------------------------------
Spatial analysis in R is transitioning to the "Simple Features" representation of spatial objects,
as implemented in the `sf package `_.
Many of the tutorials listed here were developed with the earlier "Spatial Objects in R" representation, as implemented in the ``sp`` package.
Be alert also to `changes in the GDAL and PROJ packages `_ when specifying or transforming coördinate reference systems.
* `An introduction to (geo)statistics with R <../_static/files/R_PDF/gs_intro_20Mar2019.pdf>`__
A brief introduction to exploratory and inferential geo- statistical analysis. At the same time, it introduces the R environment for statistical computing and visualisation] and several R packages, notably ``sp`` for spatial data structures and ``gstat`` for conventional geostatistics. The exercise assumes no prior knowledge of either geostatistics nor the R environment.
- :download:`R code <../_static/files/R_R/gs_intro.R>`
- `Introduction to Rikken, M. G. J., & Van Rijn, R. P. G. (1993) <../_static/files/pdf/RikkenVanRijnIntro.pdf>`__ : *Soil pollution with heavy metals—An inquiry into spatial variation, cost of mapping and the risk evaluation of copper, cadmium, lead and zinc in the floodplains of the Meuse west of Stein, the Netherlands.* Dept. of Physical Geography, Utrecht University. This is the orignal report from which the "Meuse dataset" was created.
* `Co-kriging with the gstat package of the R environment for statistical computing <../_static/files/R_PDF/CoKrigeR.pdf>`__
Improving the mapping of an undersampled attribute that is co-regionalized with a more intensively sampled attribute.
- :download:`R code <../_static/files/R_R/CoKrigeR.R>`
- :download:`plotting functions <../_static/files/R_R/ck_plotfns.R>`
* Distance education course :ref:`de_geostats`
- Supplementary exercises
- `Exercise: Change of support <../_static/files/R_PDF/exA.pdf>`__
All geographical measurements are made on some support, that is, an interval (1-D), area (2-D) or volume (3-D) of some finite size. As long as the measurements, interpretations, and predictions all refer to the same support, techniques that treat the support as a 0-D point are satisfactory. But if measurements and predictions are made on different supports, the relation between them must be determined and used to adjust the geostatistical analysis.
- :download:`R code (part 1) <../_static/files/R_R/exA1.R>`
- :download:`R code (part 2) <../_static/files/R_R/exA2.R>`
- `Exercise: Compositional variables <../_static/files/R_PDF/exB.pdf>`__
Certain (geo)statistical variables, when considered as a group, are *not independent* in feature space, because they are *constrained* to sum to some constant; the set of these is called a *composition.* They should not be modelled separately, rather, as a group.
- :download:`R code <../_static/files/R_R/exB1.R>`
.. _space-time-geostats:
- `Exercise: Spatio-temporal Geostatistics <../_static/files/R_PDF/exC.pdf>`__
Spatio-temporal observations are those for which both a *spatial location* (georeference) and a *time of observation* are recorded, as well as *attributes* measured at the specified location and time. This exercise introduces *space-time geostatistics* to analyze attributes in space and time, separately and simultaneously.
- :download:`R code <../_static/files/R_R/exC.R>`
* :download:`Interactive Excel worksheets explaining spatial autocorrelation, variograms and kriging <../_static/files/xls/SteinExcelExplanations.zip>` (XLS, compressed)
Written by `prof.dr.ir. Alfred Stein `__ (University of Twente), formatted and with some more explanation by me. (1) Simulation of spatial correlation in one dimension, (2) Ordinary Kriging, (3) Universal Kriging, (4) Cokriging, (5) Selecting a grid spacing for kriging. Distributed by permission.
R Markdown mini-tutorials
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These illustrate some details of geostatistics. Load into R Studio and compile ("knit") to HTML, or execute chunk-by-chunk
* :download:`Constructing a prediction grid <../_static/files/R_rmd/MakingPredictionGrid.Rmd>`
Shows how to create a regular grid onto which kriging or another prediction method can be applied.
* :download:`Kriging From scratch -- Spatial Classes version <../_static/files/R_rmd/KrigingFromScratch.Rmd>`
A direct application of the ordinary kriging equations to derive kriging weights. This version uses the legacy ``sp`` "Classes and Methods for Spatial Data" representation of spatial objects.
* :download:`Kriging From scratch -- Simple Features version <../_static/files/R_rmd/KrigingFromScratch_sf.Rmd>`
A direct application of the ordinary kriging equations to derive kriging weights. This version uses the newer ``sf`` "Simple Features" representation of spatial objects.
* :download:`Detecting and modelling anisotropy for Ordinary Kriging <../_static/files/R_rmd/ShowAnisotropy.Rmd>`
* :download:`Mapping classes <../_static/files/R_rmd/MappingClasses.Rmd>`
Shows how to map class probabilities over a grid using indicator kriging in ``gstat``, and then make a map of the most probable, along with a map of prediction reliability, represented by the maximum probability of any class. It also has a section on using classification trees and random forests to classify the same dataset.
* :download:`(Geo)statistical simulation <../_static/files/R_rmd/GeostatisticalSimulation.Rmd>`
Random numbers in R; simulation of binomial, normal, and mixed distributions; application to simulating random fields.
.. _tutorials-spatial-analysis:
Spatial analysis
----------------------------------------
Spatial analysis in R is transitioning to the "Simple Features" representation of spatial objects,
as implemented in the `sf package `_.
Many of the tutorials listed here were developed with the earlier "Spatial Objects in R" representation, as implemented in the ``sp`` package.
Be alert also to `changes in the GDAL and PROJ packages `_ when specifying or transforming coördinate reference systems.
* `Trend surfaces in R by Ordinary and Generalized Least Squares <../_static/files/R_PDF/ex_TrendSurface.pdf>`__
A trend surface is a map of some continuous variable, computed as a function of the coördinates. In many cases the assumption that the OLS residuals are spatially-independent is not true, so that GLS must be used to obtain a correct trend-surface formula.
- :download:`Kansas aquifer dataset <../_static/files/R_ds/AQUIFER.TXT>` (TXT)
- :download:`R code <../_static/files/R_R/ex_TrendSurface_ex1.R>`
* `Thin-plate spline interpolation with R <../_static/files/R_PDF/exTPS.pdf>`__
Shows how to fit a surface (as in the trend surface) but adjusting to local observations (as in kriging) using 2-D smoothing splines. This is useful if one wants to quickly obtain a clear map showing the main features of the variable, without the model assumptions of trend surfaces or kriging.
- :download:`Sanford transect dataset <../_static/files/R_ds/sandford.txt>` (TXT)
- :download:`R code <../_static/files/R_R/exTPS.R>`
* `Areal Data and Spatial Autocorrelation R <../_static/files/R_PDF/exArealData.pdf>`__
This tutorial gives an overview of spatial analysis of areal data, that is, attributes of polygonal entities on a map. Typical examples are political divisions, census tracts, and ownership or management parcels. The attribute relates to the whole area of the polygon, and can not be further localized.
- :download:`Central NY 8-county census tracts leukemia incidence <../_static/files/R_ds/NY_data.zip>` (shapefile, neighbours, zipped)
- :download:`R code <../_static/files/R_R/exArealData.R>`
* `Exercise: Exploratory Data Analysis with GeoDa <../_static/files/pdf/exGeoDa.pdf>`__
GeoDa is an open-source program, cross-platform program designed as a simple tool for exploratory spatial data analysis (ESDA) and some spatial modelling of spatial polygon data, that is, maps of polygon units such as census tracts or political divisions with a set of attributes measured on each one. This uses a portion of the 8-county data of the R exercise (see item just above).
- :download:`Syracuse census tracts leukemia incidence <../_static/files/R_ds/Syr.zip>` (shapefile, neighbours, zipped)
* `Point-pattern analysis with R <../_static/files/R_PDF/exPPA.pdf>`__
This tutorial gives an overview of spatial point-pattern analysis. This considers the distribution of one or more sets of points in some bounded region as the result of some stochastic process which produces a finite number of "events" or "occurrences".
- :download:`R code <../_static/files/R_R/exPPA1.R>`
* `Optimal partitioning of soil transects with R <../_static/files/R_PDF/OptPart.pdf>`_
Applies the split moving window approach to optimally partition linear series of observations to find "natural" boundaries.
- :download:`Example transect (Jururena, Matto Grosso) <../_static/files/R_ds/tr.csv>` (CSV)
General statistical methods
---------------------------------------------
* `Using the R Environment for Statistical Computing: An example with the Mercer & Hall wheat yield dataset <../_static/files/R_PDF/mhw.pdf>`__
A systematic analysis of a simple dataset: the Mercer & Hall wheat yield uniformity trial: exploratory graphics, descriptive statistics, univariate & bivariate modelling, bootstrapping, robust methods, multivariate modelling, principal components analysis, model evaluation, cross-validation, spatial analysis, spatial structure, generalized least squares, geographically-weighted regression, clustering, periodicity (spectral analysis).
- :download:`Mercer & Hall wheat yield dataset <../_static/files/R_ds/mhw.csv>` (CSV)
- :download:`R code <../_static/files/R_R/mhw_Rcode.zip>` (compressed)
* `An example of statistical data analysis using the R environment for statistical computing <../_static/files/R_PDF/corregr.pdf>`__
This tutorial presents a data analysis sequence which may be applied to environmental datasets, using a small but typical data set of multivariate point observations: 147 soil profile observations representative of the humid forest region of southwestern Cameroon . It is aimed at students in geo-information application fields who have some experience with basic statistics, but not necessarily with statistical computing. Five aspects are emphasised:
1. Placing statistical analysis in the framework of research questions;
2. Moving from simple to complex methods: first exploration, then selection of promising modelling approaches;
3. Visualising as well as computing;
4. Making correct inferences;
5. Statistical computation and visualization.
- :download:`point dataset <../_static/files/R_ds/obs.csv>` (CSV)
- :download:`R code <../_static/files/R_R/cr_Rcode_1.4.zip>` (compressed)
* `Analyzing land cover change with logistic regression in R <../_static/files/R_PDF/lcc.pdf>`__
Land cover change at 1 064 grid cells from the Chapare region of Cochabamba province, Bolivia
- :download:`point dataset <../_static/files/R_ds/lcc.csv>` (CSV)
- :download:`R code <../_static/files/R_R/lcc.zip>` (compressed)
* `Curve fitting with the R Environment for Statistical Computing <../_static/files/R_PDF/CurveFit.pdf>`__
R gives the user more insight and control than provided by "push the button" programs such as CurveFit.
- :download:`R code <../_static/files/R_R/curve_1.R>`
R Markdown mini-tutorials
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* :download:`Model evaluation <../_static/files/R_rmd/ModelEvaluation.Rmd>`
Describes and illustrates measures of model success, i.e., its presumed predictive accuracy. This is sometimes called model "validation".
Defines and illustrates RMSE, variance explained against the 1:1 line, bias, gain, Lin's concordance, Taylor diagrams, Gauch decomposition.
* `compiled HTML <../_static/files/R_html/ModelEvaluation.html>`__
* :download:`Explaining the concept of variance <../_static/files/R_rmd/ExplainVariance.Rmd>`
Explains the concept of variance and how it is affected by classification.
* `compiled HTML <../_static/files/R_html/ExplainVariance.html>`__
* :download:`Box-Cox transformation <../_static/files/R_rmd/Transformations.Rmd>`
Box and Cox (1964) developed a family of transformations designed to reduce non-normality of the errors in a linear model. Applying this transform often reduces non-linearity and heteroscedascity as well.
* `compiled HTML <../_static/files/R_html/Transformations.html>`__
* :download:`Explaining Ordinary Least Squares (OLS) regression with R <../_static/files/R_rmd/explainRegression.Rmd>`
Shows how linear models are computed by ordinary least squares (OLS) and by a robust regression variant of OLS.
* `compiled HTML <../_static/files/R_html/explainRegression.html>`__
* :download:`Demonstrating Generalized Least Squares regression <../_static/files/R_rmd/DemonstateGLSregression.Rmd>`
GLS accounts for autocorrelation in the linear model residuals. This example is of spatial autocorrelation, using the :download:`Mercer & Hall wheat yield dataset <../_static/files/R_ds/mhw.csv>` (CSV)
* `compiled HTML <../_static/files/R_html/DemonstateGLSregression.html>`__
* :download:`Meuse heavy metals exercise — CART and random forests <../_static/files/R_rmd/exMeuseTreesForests.Rmd>`
Illustrates the use of regression trees, classification trees, Cubist, *k*-nearest neighbours, and spatially-explicit random forests
* `compiled HTML <../_static/files/R_html/exMeuseTreesForests.html>`__
* :download:`Regression Trees by hand <../_static/files/R_rmd/RegressionTreesByHand.Rmd>`
Shows how ``rpart`` decides on splits when building a regression tree.
* `compiled HTML <../_static/files/R_html/RegressionTreesByHand.html>`__
* :download:`Variability of Regression Trees <../_static/files/R_rmd/RegressionTreesVariability.Rmd>`
Demonstrates the sensivity of regression trees built by ``rpart`` to small changes in the training dataset, using the Meuse heavy metals dataset.
* `compiled HTML <../_static/files/R_html/RegressionTreesVariability.html>`__
* :download:`Finding the proper complexity parameter for a Regression Tree <../_static/files/R_rmd/RegressionTreeCrossValidation.Rmd>`
Shows how cross-validation is used to assess the proper complexity parameter for a regression tree. This is the graph shown by ``printcp``, based on cross-validation computations in ``rpart``.
* `compiled HTML <../_static/files/R_html/RegressionTreeCrossValidation.html>`__
* :download:`Comparing Random Forest packages <../_static/files/R_rmd/CompareRandomForestPackages.Rmd>`
Compares the ``randomForest`` and ``ranger`` packages.
* `compiled HTML <../_static/files/R_html/CompareRandomForestPackages.html>`__
* :download:`Nonlinear Principal Components Analysis, a.k.a. Multivariate Analysis with Optimal Scaling <../_static/files/R_rmd/NonlinearPCA.Rmd>`
Practice with the "Gifi" approach of de Leeuw and colleagues. "The data analytic approach does not start with a model, but looks for transformations and combinations of [categorical] variables with the explicit purpose of representing the data in a simple and comprehensive, and usually graphical, way."
* :download:`example dataset <../_static/files/R_ds/NonlinearPCA_example.RData>` (RData)
* `compiled HTML <../_static/files/R_html/NonlinearPCA.html>`__
Regional mapping
----------------------------------------
* `Regional mapping of climate variables from point samples <../_static/files/R_PDF/exRKGLS.pdf>`__ (PDF)
Various methods for regional mapping of climate variables from station information using as predictors coördinates (Northing, Easting) and elevation, as well as the local neighbourhood: Ordinary Least Squares trend; Generalized Least Squares trend; Regression Kriging; Kriging with External Drift; Generalized Additive Models trend; Geographically-weighted regression; Data-driven methods: Regression trees, Random Forests, Cubist Thin-plate splines; Local interpolators: Ordinary kriging, inverse-distance, Thiessen polygons. Uses the ``ggplot2``, ``nlme``, ``rgdal``, ``sp``, ``gstat``, ``rpart``, ``randomForest``, ``ranger``, ``Cubist``, ``caret``, ``raster``, ``plotKML`` and ``fields`` R packages.
- :download:`R code <../_static/files/R_R/exRKGLS.R>`
- Datasets used in this tutorial:
- :download:`weather station climate summaries <../_static/files/R_ds/weather_stn_sums_1971_2000.zip>` (shapefiles, zipped, 2.5 Mb)
- `documentation of U.S. Climate Normals 1971-2000 products (PDF) <../_static/files/R_ds/US-Climate-Normals-1971-2000-Products.pdf>`__
- :download:`48-states SRTM DEM <../_static/files/R_ds/srtm_1km_48.zip>` (ESRI ASCII grid, zipped, 22.3 Mb)
- :download:`4-state bounding box 4km resolution DEM (NJ, NY, PA, VT) <../_static/files/R_ds/dem_ne_4km.RData>` (RData)
- :download:`4-state 4km resolution DEM (NJ, NY, PA, VT) <../_static/files/R_ds/dem_nj_ny_pa_vt_4km.RData>` (RData)
- :download:`USA state boundaries <../_static/files/R_ds/cb_2014_us_state_500k.zip>` (shapefile, zipped, 3.2 Mb)
- Additional covariates
The above exercise uses only three regional predictors as well as the local neighbourhood. Additional covariates might have a relation to regional or local climate, and thus might improve the regional mapping. These are: distance to the Great Lakes shoreline, distance to the Atlantic Ocean coast, two terrain indices (Multi-resolution Valley-Bottom Flatness MRVBF and Terrain Ruggedness Index TRI), and population density within two radii around stations or prediction points (nominally 2.5' and 15').
* :download:`Setting up the additional covariates <../_static/files/R_rmd/exRKGLS_SetupAdditionalCovariates.Rmd>` (R Markdown source)
- :download:`Lakes Erie & Ontario <../_static/files/R_ds/LakesOntarioErie.gpkg>` (OGC Geopackage)
- :download:`Atlantic Ocean coastline, NE USA <../_static/files/R_ds/AtlanticCoastLine.gpkg>` (OGC Geopackage)
- :download:`multiresolution index of valley bottom flatness (MRVBF) <../_static/files/R_ds/mrvbf_ne_4km.sdat>` (SAGA raster)
- :download:`Terrain Ruggedness Index <../_static/files/R_ds/dem_ne_4km_TRI3_IDW2.sdat>` (SAGA raster)
- :download:`World population density (15' resolution) <../_static/files/R_ds/gpw_v4_population_density_rev11_2000_15_min.tif>` (GeoTiff, 1 Mb)
- :download:`World population density (2.5' resolution) <../_static/files/R_ds/gpw_v4_population_density_rev11_2000_2pt5_min.tif>` (GeoTiff, 23.6 Mb)
* :download:`Using the additional covariates <../_static/files/R_rmd/exRKGLS_UseAdditionalCovariates.Rmd>` (R Markdown source)
- :download:`Prepared dataset of additional covariates <../_static/files/R_ds/StationsDEM_covariates.RData>` (RData, 2.5 Mb)
.. _zh-ne:
* Regional mapping of climate variables from point samples: Northeast China
This applies many of the methods of the above tutorial to a region in China.
- In-class exercises *Regional mapping of climate variables* (R Markdown sources):
- :download:`(1) Data exploration <../_static/files/R_rmd/ZhClimateInClassExercise1.Rmd>`
- :download:`(2) Data-driven methods <../_static/files/R_rmd/ZhClimateInClassExercise2.Rmd>`
- :download:`(3) Model-based methods <../_static/files/R_rmd/ZhClimateInClassExercise3.Rmd>`
- :download:`Dataset used in the exercises <../_static/files/R_ds/zhne_stations.RData>` (RData)
- Setting up the dataset
- `procedure to set up the dataset <../_static/files/R_PDF/ZhClimateAssignment_DatasetSetup.pdf>`__ (PDF)
- :download:`Procedure to set up the dataset <../_static/files/R_rmd/ZhClimateAssignment_DatasetSetup.Rmd>` (R Markdown source)
- :download:`Temperature summaries 1981-2010 <../_static/files/R_ds/temperature_stn_1981_2010.csv>` (CSV)
- :download:`Precipitation summaries 1981-2010 <../_static/files/R_ds/precip_stn_1981_2010.csv>` (CSV)
- :download:`Administrative boundaries of China <../_static/files/R_ds/gadm36_CHN.gpkg>` (Geopackage, 72.6 Mb)
- :download:`Procedure to set up a prediction grid <../_static/files/R_rmd/ZhClimate_CreateRegionalGrid.Rmd>` (R Markdown source)
- :download:`World SRTM 1km resolution DEM <../_static/files/R_ds/SRTM_1km_GRD.zip>` (ESRI grid, compressed, 195.2 Mb)
- Additional covariates
The above exercise uses only three regional predictors as well as the local neighbourhood. Additional covariates might have a relation to regional or local climate, and thus might improve the regional mapping. These are: one terrain index (Multi-resolution Valley-Bottom Flatness MRVBF) and population density within two radii around stations or prediction points (nominally 2.5' and 15').
* :download:`Setting up the additional covariates <../_static/files/R_rmd/ZhClimate_SetupAdditionalCovariates.Rmd>` (R Markdown source)
- :download:`Multiresolution index of valley bottom flatness (MRVBF) <../_static/files/R_ds/mrvbf_ne.sdat>` (SAGA raster)
- :download:`World population density (15' resolution) <../_static/files/R_ds/gpw_v4_population_density_rev11_2000_15_min.tif>` (GeoTiff, 1 Mb)
- :download:`World population density (2.5' resolution) <../_static/files/R_ds/gpw_v4_population_density_rev11_2000_2pt5_min.tif>` (GeoTiff, 23.6 Mb)
* :download:`Using the additional covariates <../_static/files/R_rmd/ZhClimate_UseAdditionalCovariates.Rmd>` (R Markdown source)
- :download:`Prepared dataset of additional covariates <../_static/files/R_ds/Zh_StationsDEM_covariates.RData>` (RData, 6 Mb)
Geographic information systems (GIS)
--------------------------------------------
* `A "simple" analysis with QGIS <../_static/files/R_PDF/exQGIS.pdf>`__
Water pollution hazard for a drinking water reservoir in Tompkins County, NY (USA), using QGIS 3.4
* `Creating geometrically-correct photo-interpretations, photomosaics, and base maps for a project GIS <../_static/files/pdf/TN_Georef_wFigs_Screen_v3.pdf>`__
.. _tsa:
Time-series analysis
----------------------------------------
* `Time-series analysis with R <../_static/files/R_PDF/exTSA.pdf>`__
- :download:`tutorial datasets <../_static/files/R_ds/R_ts_ds.zip>` (compressed)
- :download:`R code <../_static/files/R_R/exTSA.R>`
* :ref:`Spatio-temporal geostatistics `, includes a section on temporal structure
* `Fitting rational functions to time series in R <../_static/files/R_PDF/rat.pdf>`__
Rational functions are ratios of any two polynomials in a single variable. In this example we fit linear/quadratic rational functions to an irregular time-series of proportional changes from an initial condition of a soil property in response to land use. The fitted equation can be interpreted to find the time to reach a maximum proportional deviation, and the value of that deviation.
- :download:`tutorial dataset <../_static/files/R_ds/MDS_PD_12.csv>` (CSV)
.. [#] or adapted from others
.. meta::
:description: D G Rossiter's professional pages -- tutorials
:keywords: R, geostatistics, time-series analysis, regional mapping, R Markdown
Last modified |today|