Student information

MSc thesis topic: Can a Satellite-Geoguesser playing Neural Network predict cancer risk?

Many environmental and social factors influence our planet's appearance captured in Satellite images (Figure, left side). These factors can be distilled into a location encoder neural network by training it by geo-localization: training it to predict location from coordinates and coordinates from the location. This is the training objective behind SatCLIP: Satellite Contrastive Location-Image Pretraining (Klemmer et al., 2024). This training process aims to distil ground conditions that determine visual patterns from satellite images into a location encoding neural network (Rußwurm et al., 2024).

In this thesis, you will train such a neural network using the SatCLIP training objective on Sentinel-2 data in Spain. This Spain-specific SatCLIP model will then be tested for disease mapping using medical data of 8000 territories in Spain provided by our colleagues at the Public university in Pamplona (UPNA), Spain.

Relevance to research/projects at GRS or other groups

This Thesis is supervised by Marc Rußwurm who will provide expertise in machine learning and deep location encoding and Sytze de Bruin for expertise in geostatistics. The results of this thesis may be helpful for disease mapping in Spain through collaborations with María Dolores Ugarte and colleagues at the Public university in Pamplona (UPNA), Spain.

Research Questions

  • Can representations obtained from a SatCLIP model trained on Spanish Sentinel-2 data predict disease risks better than existing methods?
  • What are the benefits and challenges of training a location-specific SatCLIP model?

Objectives

  • generate a geo-dataset of the Wageningen campus and encode it in a neural implicit representation.
  • discuss their applicability in the context of Geo-information Science.

Requirements

  • familiar yourself with location encoding and implicit neural geo-representations
  • collect Sentinel-2 data on Spain and specifically on Urban areas.
  • train a SatCLIP model on the collected Sentinel-2 data
  • evaluate the embeddings of this SatCLIP model either on medical data directly or on easier proxi objectives.
  • required: deep learning course
  • required: interest in geodata and deep learning

Expected reading list before starting the thesis research

Theme(s): Modelling & visualisation