Dataset Size and Machine Learning - Open NMR Databases as a Case Study
Date
2022-08-10
Advisors
Journal Title
Journal ISSN
ISSN
Volume Title
Publisher
IEEE
Type
Conference
Peer reviewed
Yes
Abstract
The amount of data needed for training machine learning methods is an open question. Here, we use a problem from chemistry for examining this question. The problem is a special case of a graph data analysis. It can be tackled inter alia by using graph convolutional networks. We show that newer methods can provide good results, but need large amounts of data, which are not always available. In some cases, older methods may be preferable for low amounts of data. In the longer term, open databases can help with this problem.
Description
Keywords
machine learning, artificial intelligence, sample size, chemoinformatics, NMR
Citation
Kuhn, S., Borges, R.M., Venturini, F. and Sansotera, M. (2022) Dataset Size and Machine Learning - Open NMR Databases as a Case Study. 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1632-1636