MSc Thesis topic: Garbage in garbage out? understanding the impact of training datasets on remote sensing based monitoring

Remote sensing based land monitoring frequently uses training and calibration datasets to predict different aspects of the earth's cover, e.g., forests, land use, land cover, and soil types.
Such predictions are essential for policymaking and policy implementation including the implementation of sustainable development goals.

Many scientific publications recommend a close look at the quality of training datasets that are used for remote sensing based monitoring. Despite its importance, the impact of training datasets on remote sensing based monitoring remains understudied.

This issue is particularly important as today maps can be made with relative ease thanks to the advancements in large-area processing such as Google Earth Engine cloud computing. For example, the world's first global land cover map at 10m resolution was produced using Google Earth Engine (Gong et al. 2019) and the WorldCover map by ESA ( Although progress has been made in improving remote sensing based predictions, some products disagree on a large scale.

Even when the same remote sensing imagery is used and the same variables (e.g., land cover) are mapped, the predictions can be very different. Such differences can be associated with the quality of training datasets as the predictions are simply the best guess of the variable of interest given the input data. How much of the differences could be associated with training data? How does the landscape and data availability influence the differences in remote sensing predictions?

This research aims to address these questions. The research can be set up for some regions in Europe depending on data availability and interests.


  • Simulate land cover/use mapping to study the effect of training data error, sampling and size.
  • Generating remote sensing based predictions of land use/land cover using a different set of training datasets
  • Assessing source of uncertainties in the predictions related to training data, data availability, and landscape


  • Advanced remote sensing
  • Geoscripting

Theme(s): Integrated Land Monitoring