Student information

MSc thesis topic: Feature Space Coverage and Uncertainty in Spatial Biomass Prediction

Accurate estimates of environmental properties at subnational to global scale are essential for realizing climate goals, for example by monitoring the carbon balance. The greatest accuracy is obtained by combining reference data from environmental surveys such as forest inventories with maps. If the reference data has been acquired using standardized methodology and with probability sampling, this can be achieved by well-established statistical inference. However, the prevailing mix of surveyed reference data has incomplete coverage, has been measured by different methods and lacks a consistent sampling design. Therefore, model-based approaches (e.g., geostatistics & machine learning) are needed for estimating the target properties. The aim of this thesis is to assess the extent to which model-based predictions are supported by the reference data by so-called within domain determination and uncertainty assessment.

The above figure shows examples of within-domain determination; (a) sample points (red) taken from Fig. 2(h) in de Bruin et al. (2022); (b) isolation score computed by the Isolation Forest approach (Liu et al., 2012; Cortes, 2022); (c) spatial, cell-based domain delineation, where hexagon cells lacking sample points are greyed out; (d) outlier mapping by thresholding the isolation score shown in (b). Note that the within-domain area in (d) is much larger than that in (c).
To allow comparison of different samples, an existing map will be used for providing reference data.

Objectives

  • Explore and compare methods for assessing to what extent sample data cover the feature space on which predictions are made.
  • Assess whether model-based prediction uncertainty estimates apply to claimed within-domain regions..

Literature

Requirements

  • Spatial Modelling and Statistics (GRS30306)
  • Interest in machine learning

Theme(s): Modelling & visualisation