Deep Clustering for Metagenomics

dc.cclicenceN/Aen
dc.contributor.authorGongora, Mario Augusto
dc.date.accessioned2021-01-18T13:43:15Z
dc.date.available2021-01-18T13:43:15Z
dc.date.issued2020-12-10
dc.description.abstractMetagenomics is an area that is supported by modern next generation sequencing technology, which investigates microorganisms obtained directly from environmental samples, without the need to isolate them. This type of sequencing results in a large number of DNA fragments from different organisms. Thus, the challenge consists in identifying groups of DNA sequences that belong to the same organism. The use of supervised methods for solving this problem is limited, despite the fact that large databases of species sequences are available, by the small number of species that are known. Additionally, by the required computational processing time to analyse segments against species sequences. In order to overcome these problems, a binning process can be used for the reconstruction and identification of a set of metagenomic fragments. The binning process serves as a step of pre-processing to join fragments into groups of the same taxonomic levels. In this work, we propose the application of a clustering model, with a feature extraction process that uses an autoencoder neural network. For the clustering a k-means is used that begins with a k-value which is large enough to obtain very pure clusters. These are reduced through a process of combining various distance functions. The results show that the proposed method outperforms the k-means and other classical methods of feature extraction such as PCA, obtaining 90% of purity.en
dc.funderNo external funderen
dc.identifier.citationBonet, I., Pena, A., Lochmuller, C., Patino, A., Gongora, M. (2020) Deep Clustering for Metagenomics. In: Cazzaniga P., Besozzi D., Merelli I., Manzoni L. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2019. Lecture Notes in Computer Science, vol. 12313. Springer, Cham.en
dc.identifier.doihttps://doi.org/10.1007/978-3-030-63061-4_29
dc.identifier.isbn9783030630614
dc.identifier.urihttps://dora.dmu.ac.uk/handle/2086/20576
dc.language.isoenen
dc.peerreviewedYesen
dc.publisherSpringeren
dc.researchinstituteInstitute of Artificial Intelligence (IAI)en
dc.subjectDeep learningen
dc.subjectartificial intelligenceen
dc.subjectbioinformaticsen
dc.subjectgenomicsen
dc.titleDeep Clustering for Metagenomicsen
dc.typeArticleen

Files

License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.2 KB
Format:
Item-specific license agreed upon to submission
Description: