---
title: "Assigment: Trend Surfaces"
author: "A Student 某个学生"
date: "`r Sys.Date()`"
output:
html_document:
toc: TRUE
toc_float: TRUE
theme: "lumen"
code_folding: show
number_section: TRUE
fig_height: 6
fig_width: 8
fig_align: 'center'
---
# Dataset
Task: Load the "Northeast Climate" R Dataset `ne_stations.RData`, with the `load` function. Use the `verbose=TRUE` optional argument to display the names of the objects that are loaded.
Q: What are the data types of these objects? For those that are spatial, what are their coördinate reference systems (CRS)?
Note: you will have to load the `sp` package to see the CRS.
A:
We will work mostly with object `ne.df`, a dataframe version of the climate stations.
Q: For how many climate stations are there records? How many states of the USA are represented?
A:
Task: Display the names of the variables in this dataset.
Q: What are the attributes in the `ne.df` data frame?
A:
The response variable here is `ANN_GDD50` (annual growing-degree days), the coördinates are in fields `E` and `N`, the elevation of the station is in field `ELEVATION_`.
# First-order trend surface
Task: Compute a first-order trend surface model using the two coordinates (East and North) as the predictors, and the growing degree days as the dependent (response) variable, using Ordinary Least Squares linear regression.
Display its summary and the linear model diagnostic plots "residuals vs. fitted" and "Normal Q-Q".
Q: What proportion of the variation in GDD50 is explained by the first-order trend?
A:
Q: Which of the two coördinates most affects the GDD50?
A:
Q: Is there a relation between the fitted values and residuals? If so, what?
A:
Q: Are the residuals normally-distributed? If not, how so? Are there any especially large residuals?
A:
Optional Task: Identify the most poorly-modelled station.
Optional Q: Where is this? Why might it be so poorly modelled?
Optional A:
Task: Display a post-plot of the residuals, with the positive residuals (under-predictions) shown in green and the negative (over-predictions) in red.
Q: What is the spatial pattern of the trend-surface residuals? Does it suggest a higher-order trend?
A:
# Second-order trend surface
Task: Repeat all the steps of the previous section, and answer the same questions, for a full 2nd-order trend surface.
Q: What proportion of the variation in GDD50 is explained by the second-order trend?
A:
Q: Which of the two coördinates (taking into account their squares also) most affects the GDD50?
A:
Q: Is there a relation between the fitted values and residuals? If so, what?
A:
Q: Are the residuals normally-distributed? If not, how so? Are there any especially large residuals?
A:
Optional Task: Identify the most poorly-modelled station.
Optional Q: Where is this? Why might it be so poorly modelled?
Optional A:
Task: Display a post-plot of the residuals, with the positive residuals (under-predictions) shown in green and the negative (over-predictions) in red.
Q: What is the spatial pattern of the trend-surface residuals? Does it suggest a higher-order trend?
A:
# Comparing trend orders
Task: Compare the two orders of trend surfaces with an analysis of variance.
Q: Is the 2nd-order trend clearly superior to the 1st-order trend?
A:
# Mapping
Task: Load the `raster` package into the workspace.
Then, load a gridded DEM covering the study area `dem_ne_4km.RData`, with the `load` function. Use the `verbose=TRUE` optional argument to display the names of the objects that are loaded.
Q: What object is loaded? What is its class? What is its CRS? Which of the point datasets does this match?
A:
Q: What is the bounding box and resolution of the grid? (You can see these with the `summary` function.)
A:
Task: Display the grid with the `spplot` function.
Task: Predict over this study area with the 2nd-order trend surface. Add this as a field in the study area data frame.
Task: Summarize the predictions.
Q: How do the predictions compare with the summary of the weather stations observations?
A:
Task: Display a map of the predictions using the `spplot` function, and specifying the field to display with the `zcol` argument to `spplot`.
Q: Is this trend surface a realistic representation of the growing-degree days? Explain.
A:
# Generalized Additive Model (GAM)
Task: Display a scatterplot of the GDD50 against the two coördinates.
Q: Do these relations look linear or quadratic throughout their ranges?
A: No, the relation with North seems linear in the southern part, but then another line (less slope) in the north.
Task: Compute a Generalized Additive Model (GAM) of GDD50 modelled by the two coördinates.
Q: What proportion of the variation in GDD50 is explained by the GAM? How does this compare to the 2nd-order trend surface?
A:
Task: Predict over this study area with the fitted GAM. Add this as a field in the study area data frame.
Note: you will have to convert the `SpatialPixelsDataFrame` to a regular `data.frame` as the argument to `newdata`, because `predict.gam` does not recognize `sp` objects.
Task: Summarize the predictions.
Q: How do the predictions compare with the summary of the weather stations observations, and to the summary of the 2nd-order trend surface predictions?
A:
Task: Display a post-plot of the residuals.
Q: What is the spatial pattern of the GAM trend-surface residuals? How does this differ from the pattern of the 2nd-order trend residuals?
A:
Task: Display a map of the predictions using the `spplot` function.
Q: Is this trend surface a realistic representation of the growing-degree days? Explain. How does it improve on the 2nd-order trend?
A:
# Thin-plate splines
Task: Load the `fields` package, to use for thin-plate spline calculation.
Task: Add a field to the stations dataframe, containing a matrix of the two coördinates, as required by the `fields` package.
Task: compute the thin-plate spline for the growing degree days, as a function of coördinates.
Task: Predict with the thin-plate spline over the study area. Note you will have to make a matrix from the grid coördinates, interpolate on that, and then convert the prediction to a field in the grid.
Task: Display the map.
Q: Is this trend surface a realistic representation of the growing-degree days? Explain. Does it improve on the GAM trend?
A:
# Challenge (Optional!) -- including elevation
Of course, elevation also affects temperature and thus the growing-degree days.
Repeat the exercise, but including elevation in all the models. Comment on how and why the trend surfaces (now 3D) are different/improved from the 2D trends.
In these models, consider elevation as an independent dimension, with no interaction with the geographic coördinates (although, that may not be a valid assumption... you could test this... ).
Also, see if a transformation of elevation might make its relation with the target variable more linear.
# Saving the workspace
Task: Save the workspace for a later exercise using the `save.image` function: