Publications

iPRESTO: integrated Prediction and Rigorous Exploration of biosynthetic Sub-clusters Tool

Louwen, J.J.R.; Kautsar, S.A.; van der Burg, S.; Medema, M.H.; van der Hooft, J.J.J.

Summary

iPRESTO (integrated Prediction and Rigorous Exploration of biosynthetic
Sub-clusters Tool) is a command line tool for the detection of gene sub-clusters in
a set of Biosynthetic Gene Clusters (BGCs) in GenBank format. BGCs are tokenised
by representing each gene as a combination of its Pfam domains, where subPfams
are used to increase resolution. Tokenised BGCs are filtered for redundancy
using similarity network with an Adjacency Index of domains as a distance metric.
For the detection of sub-clusters two methods are used: PRESTO-STAT, which is
based on the statistical algorithm from Del Carratore et al. (2019), and the
novel method PRESTO-TOP, which uses topic modelling with Latent Dirichlet
Allocation. The sub-clusters found with iPRESTO can then be linked to Natural
Product substructures.