Browsing by Author "Votis, Konstantinos"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Open Access Acoustic scene classification: from a hybrid classifier to deep learning(2017-11-16) Vafeiadis, Anastasios; Kalatzis, Dimitrios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufThis report describes our contribution to the 2017 Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. We investigated two approaches for the acoustic scene classification task. Firstly, we used a combination of features in the time and frequency domain and a hybrid Support Vector Machines - Hidden Markov Model (SVM-HMM) classifier to achieve an average accuracy over 4-folds of 80.9% on the development dataset and 61.0% on the evaluation dataset. Secondly, by exploiting dataaugmentation techniques and using the whole segment (as opposed to splitting into sub-sequences) as an input, the accuracy of our CNN system was boosted to 95.9%. However, due to the small number of kernels used for the CNN and a failure of capturing the global information of the audio signals, it achieved an accuracy of 49.5% on the evaluation dataset. Our two approaches outperformed the DCASE baseline method, which uses log-mel band energies for feature extraction and a Multi-Layer Perceptron (MLP) to achieve an average accuracy over 4-folds of 74.8%.Item Open Access Audio Content Analysis for Unobtrusive Event Detection in Smart Homes(Elsevier, 2019) Vafeiadis, Anastasios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufEnvironmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier, achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score of 92.7% and a recognition accuracy of 96%.Item Open Access Audio-based Event Recognition System for Smart Homes(IEEE, 2017) Vafeiadis, Anastasios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufBuilding an acoustic-based event recognition system for smart homes is a challenging task due to the lack of high-level structures in environmental sounds. In particular, the selection of effective features is still an open problem. We make an important step toward this goal by showing that the combination of Mel-Frequency Cepstral Coefficients, Zero- Crossing Rate, and Discrete Wavelet Transform features can achieve an F1 score of 96.5% and a recognition accuracy of 97.8% with a gradient boosting classifier for ambient sounds recorded in a kitchen environment.Item Open Access Comparing CNN and Human Crafted Features for Human Activity Recognition(IEEE, 2019-08) Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufDeep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size.Item Open Access Energy-based decision engine for household human activity recognition(IEEE, 2018-03) Vafeiadis, Anastasios; Vafeiadis, Thanasis; Zikos, Stelios; Krinidis, Stelios; Votis, Konstantinos; Giakoumis, Dimitrios; Ioannidis, Dimosthenis; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufWe propose a framework for energy-based human activity recognition in a household environment. We apply machine learning techniques to infer the state of household appliances from their energy consumption data and use rulebased scenarios that exploit these states to detect human activity. Our decision engine achieved a 99.1% accuracy for real-world data collected in the kitchens of two smart homes.Item Metadata only Feature learning for human activity recognition using convolutional neural networks: A case study for inertial measurement unit and audio data(Springer, 2020-01-24) Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufThe use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specific expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (1) different topologies and parameters are assessed to identify the best candidate models for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (2) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data, balanced accuracy was 91.98% on the UCI-HAR dataset, and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.Item Open Access Image-based Text Classification using 2D Convolutional Neural Networks(IEEE, 2019-08) Merdivan, Erinç; Vafeiadis, Anastasios; Kalatzis, Dimitrios; Hanke, Sten; Kropf, Johannes; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, Raouf; Geist, MatthieuWe propose a new approach to text classification in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations of the visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional natural language processing algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-ofart accuracy results for a Chinese text classification task and achieved promising results for seven English text classification tasks. Furthermore, our approach outperformed the memory networks without match types when using out of vocabulary entities from Task 4 of the bAbI dialog dataset.Item Open Access Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection(International Speech Communication Association, 2019-09) Vafeiadis, Anastasios; Fanioudakis, Eleftherios; Potamitis, Ilyas; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufSpeech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions.