Audio Content Analysis for Unobtrusive Event Detection in Smart Homes

dc.cclicenceCC-BY-NC-NDen
dc.contributor.authorVafeiadis, Anastasios
dc.contributor.authorVotis, Konstantinos
dc.contributor.authorGiakoumis, Dimitrios
dc.contributor.authorTzovaras, Dimitrios
dc.contributor.authorChen, Liming
dc.contributor.authorHamzaoui, Raouf
dc.date.acceptance2019-08-22
dc.date.accessioned2019-09-20T08:16:45Z
dc.date.available2019-09-20T08:16:45Z
dc.date.issued2019
dc.descriptionInstitute of Engineering Sciences The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.en
dc.description.abstractEnvironmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier, achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score of 92.7% and a recognition accuracy of 96%.en
dc.funderEuropean Union (EU) Horizon 2020en
dc.identifier.citationVafeiadis, A., Votis, K., Giakoumis, D., Tzovaras, D., Chen, L. and Hamzaoui, R. (2019) Audio content analysis for unobtrusive event detection in smart homes. Engineering Applications of Artificial Intelligence, in press.en
dc.identifier.urihttps://dora.dmu.ac.uk/handle/2086/18477
dc.language.isoen_USen
dc.peerreviewedYesen
dc.projectidMarie Skłodowska-Curie grant agreement No. 676157, project ACROSSINGen
dc.publisherElsevieren
dc.researchinstituteCyber Technology Institute (CTI)en
dc.subjectSmart homesen
dc.subjectAmbient assisted livingen
dc.subjectAudio signal processingen
dc.subjectFeature extractionen
dc.subjectFeature selectionen
dc.subjectDeep learningen
dc.titleAudio Content Analysis for Unobtrusive Event Detection in Smart Homesen
dc.typeArticleen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EAAI-19-440R1.pdf
Size:
2.42 MB
Format:
Adobe Portable Document Format
Description:
Main article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.2 KB
Format:
Item-specific license agreed upon to submission
Description: