Longread

Proper data management makes researchers’ lives easier

December 11, 2020

Analysing thousands of satellite maps from all over the globe in one go, registering the Dutch wildlife population automatically using camera images, receiving lab-research results neatly sorted on your desktop computer. Data management may sound complicated: it makes the researcher’s life a lot simpler, says Erik van den Bergh. He works as an infrastructure coordinator for the Wageningen Data Competence Centre (WDCC). ‘By combining data correctly, you can use it for so much more, which is super cool.’

Erik van den Bergh (32) is something of an architect. He listens to the researcher’s wishes, and designs a tailored “house”. That may be as easy as ordering a different computer and installing the right software, but it may also require automating and unlocking the research to allow many more people to make use of it.

Such as?

‘Consider, for example, the data collected by Marine Research. They measure a wide range of things, from tidal flows to mussel growth—a whole lot of data. Should you enter all this data in a single platform, -an interactive map- it would make all this information visible at once. This benefits the researchers, but also governments, businesses and interested civilians. All researchers want their data to be relevant, to be used. By combining data correctly, you can use it for so much more, which is super cool.’

I understand how many researchers get creeped out by data management.

Erik van den Bergh is the infrastructure coordinator for WDCC. Image: private photograph

Super cool for an IT specialist such as yourself. Does the average researcher feel the same?

‘Many researchers get creeped out by data management, and I understand. They are under a great deal of pressure, having to produce cutting-edge results in a short time-span with few resources, and publish about it as much as possible. Giving thought to how they file and process their data is not their primary concern. It is precisely for this reason that the WDCC is still not all that visible, despite being in existence since 2017. But well, why would you ask for something you don’t know exists?’

Why should colleagues give you a call anyway?

‘We can provide significant support for their research. Data systems can alleviate the work! This can be done at several levels: from the choice of the computer to software, but further still. Consider, for example, deploying artificial intelligence. This may save a researcher a lot of work. After all, you train a computer to think and analyse independently.’

Training a computer to “think” autonomously, can save researchers a lot of work.

Where at WUR are we already working with this cutting edge technology?

‘At the Wildlife Ecology and Conservation group, for example. There, cameras are deployed throughout the country to monitor wildlife populations. These images are currently being labelled by volunteers, as in: this is a red deer, and that is a wild boar. But in time, we want a computer programme to take over this job independently. By feeding many images into the computer, you can teach it to identify wildlife and register it automatically. Not just useful for the researcher, but also for a municipality wanting to know if a construction site is the habitat of wild or endangered species. A similar mechanism is under development at Agrosystems Research, which receives thousands of satellite images of farmlands from across the globe. Analysing these images on things such as soil usage, crop growth, and fertilisation enables you to ultimately predict what crop thrives where. Analysing this by hand is a huge endeavour. If a computer takes over this work, however, you will swiftly know what crop you could best cultivate in, to name a random place, Costa Rica.’

Computers can autonomously identify video images of, for example, red deer, eliminating the need for volunteers to do this job. Image: Shutterstock

Satellite images of farmland can be analysed automatically to reveal what crop thrives where. Image: Shutterstock

This saves hours, days, perhaps months of work?

‘Yes, and it goes even further for those that want. Take the Synthetic Systems Biology group. Researchers there conduct DNA analyses in the lab. These are automatically uploaded to a computer, which then checks the validity of the results and returns the results neatly packaged within minutes.’

Okay, but, does this work exclusively for researchers who take measurements and analyse images?

‘No. Data management can also speed up things such as survey analyses, for example, questionnaires used in studies conducted by Human Nutrition and Health. In each study, there may be people from, say, Surinam origins, but insufficient to draw conclusions about their food use. However, if you combine the data, you may suddenly have a group large enough to draw significant conclusions. In an international perspective, sound data management can also provide valuable insights. This is demonstrated now with the COVID-19 outbreak. All the genetic test-results from infected individuals are gathered in a single, global data portal. This allows you to analyse what virus mutations are responsible for what outbreak.’

Sounds great! But, how do you take the first step toward sound data management?

‘Filing data is step one, as demonstrated in the COVID-database. Uniform storage, accessible to anyone with the correct clearance, now and in the future, is essential. This is why I am currently working on introducing iRods and YODA. These systems allow data to be centrally stored within WUR, and facilitate the exchange of information between universities. To date, many colleagues store their results and metadata (information on things such as research conditions, ed.) in Excel files on a flash drive or external hard disk. Not very safe, as these items are easily misplaced, and not sustainable: other scientists must be able to access and verify the data at a later stage. A requirement to meet the FAIR standards.’

Step one? File your data properly, so that you can find it again now and in the future.

FAIR?

‘In 2016, the G20 stipulated that research data must meet the FAIR principle. Findable, accessible, interoperable (usable in different programmes) and reusable. To make data findable and accessible, it must be stored in such a way that it is not lost if a researcher leaves the university, taking his flash drive with him. But, to make data reusable, a unique description of the metadata is required. This should stipulate how the data was collected, with what equipment and under what circumstances. The Dutch Research Council (NWO) has made FAIR a requirement in awarding grants.’

Grants are only awarded if data is made public?

‘No, that is a misunderstanding. Researchers think that FAIR is the same as “open data”, which is entirely open to the general public. This leads to the fear that just anyone could use their data. But FAIR means that the researcher manages his data in such a way that the data are shareable. It is the researcher who decides who is granted access. This could be a single colleague or the entire world.’

So, how does one become FAIR?

‘Simply check out a short PowerPoint on YODA, and you will already learn how to store your data properly. It is a lot simpler than it seems. But, if you want to get more out of your data, call me or send an email to data@wur.nl and I will gladly help. Sometimes, others are already involved in something similar, which allows me to bring departments together and find a tailored solution together.’