Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Statistics and Actuarial Sciences


Murdoch, Duncan

2nd Supervisor

Kulperger, Reg



Carbon dioxide (CO2) flux is important for agriculture and carbon cycle studies. Only a small proportion of the land is currently covered by proper equipment to directly collect CO2 flux data. The CO2 flux data has an obvious annual cycle with the phase changing from year to year. How to build a model to estimate the annual effect and seasonal dynamics is a challenging task. With the help of the Moderate Resolution Imaging Spectroradiometer (MODIS) which is carried by NASA satellites, corresponding data, such as normalized difference vegetation index (NDVI), is freely available from NASA. Our goals are modeling the seasonal dynamics to generate reasonable predictions, and building a model using MODIS data to predict the CO2 flux data at any location.

In modeling single sites, we treat each year as a multivariate observation. We use functional data analysis (FDA) to smooth the CO2 flux data for each year separately. Then we use the landmark registration to standardize the seasonal dynamics. On the registered time scale, we build a model of the curves using a multivariate normal distribution. We use Dirichlet regression to model the seasonal dynamics and map the registered curves back to the natural time scale. These steps allow us to simulate CO2 flux on the natural time scale.

For our spatial study, we study three models. In the first model, we decompose the CO2 flux data into different components and build a model based on the spatial correlations of each component. The second model is a functional linear regression model (FLRM), where we use NDVI as the covariate. Both CO2 flux and NDVI are annual multivariate observations treated as functional objects. In the third model, we use a generalized additive model (GAM) to analyze the data as a time series indexed by day, with covariates such as NDVI, latitude, longitude etc.

We use parametric bootstrap to validate our single location modeling on 55 flux sites. The local and the majority of global coverage rate are around 95%. Among the three spatial models, the GAM performed best in that it had the lowest out of sample prediction mean square error. The FLRM also shows a great potential for modeling with limited information.