Dataset Size and Machine Learning - Open NMR Databases as a Case Study

Abstract

The amount of data needed for training machine learning methods is an open question. Here, we use a problem from chemistry for examining this question. The problem is a special case of a graph data analysis. It can be tackled inter alia by using graph convolutional networks. We show that newer methods can provide good results, but need large amounts of data, which are not always available. In some cases, older methods may be preferable for low amounts of data. In the longer term, open databases can help with this problem.

Description

Keywords

machine learning, artificial intelligence, sample size, chemoinformatics, NMR

Citation

Kuhn, S., Borges, R.M., Venturini, F. and Sansotera, M. (2022) Dataset Size and Machine Learning - Open NMR Databases as a Case Study. 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1632-1636

Rights

Research Institute