Correcting data imbalance for semi-supervised Covid-19 detection using X-ray chest images
Date
Advisors
Journal Title
Journal ISSN
ISSN
Volume Title
Publisher
Type
Peer reviewed
Abstract
A key factor in the ght against viral diseases such as the coronavirus (COVID-19) is the identi cation of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classi cation of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classi cation accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classi cation (COVID-19 positive and normal cases). For multi-class classi cation (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.