Colloquium
Unlocking the Value of Existing Reference Datasets using Automated Remote Sensing Methods and Deep Learning Algorithms
By Thijs Res
Abstract
Safeguarding the quality of reference data is critical for proper validation, training and calibration of satellite products and their suitability in conducting scientific research with regard to Land Use/Land Cover (LULC) analysis. Many open science initiatives make their reference datasets publicly available, whereby the demand for insight in the usability of accumulating historical datasets ever increases due to the risk of it being outdated. This thesis report explains a multifaceted approach to estimating the near-present validity of historical LULC reference data. Multiple model strategies are proposed, using Principal Component Analysis (PCA) and deep learning Convolutional Neural Networks (CNN), to assess the consistency of class-specific spectral characteristics. From a collection of gathered reference data, Landsat pixel-sized sample units are generated using data harmonization and data resampling techniques. After class-specific PCA’s for each data subset attributed to a harmonized class label, a Hotelling’s T2 confidence ellipse is constructed and proposed as an outlier detection method. Subsequent validation results are used to assess PCA model strategy capabilities of bitemporal change detection and multiclass differentiation. Next, image-like representations of tabular reference sample units are generated to train and test a modified LeNet-5 CNN architecture in Landsat pixel-based classification, achieving a test accuracy of 94.4% with an optimised hyperparameter set. Finally, PCA outlier detection is used to discard supposed outliers in the CNN training data to investigate if this remote sensing methodology contributes to improved classification results.